Memory-Efficient KV Cache Optimization for Large Language Model Inference at the Edge

Title	Memory-Efficient KV Cache Optimization for Large Language Model Inference at the Edge
Publication Type	Conference Paper
Year of Publication	2026
Authors	Zhang, C., H. Tan, H. Pan, Y. Xu, H. Du, L. Zhang, and X. Fu
Conference Name	IEEE INFOCOM 2026
Date Published	05/2026
Publisher	IEEE
Conference Location	Tokyo, Japan

Computer Networks Group

Georg-August-Universität Göttingen
Institut für Informatik
Computer Networks Group
Goldschmidtstr. 7
37077 Göttingen
Germany

Phone: +49 551 39-172032
Fax: +49 551 39-14416

You are here