High Throughput and Low Latency LLM Serving via Adaptive KV Caching

Abstract

The substantial memory demands of model weights and key-value (KV) caches often lead to severe memory bottlenecks in LLM serving. Existing systems address this by offloading KV caches to host memory and rapidly restoring them on demand before decoding. However, these approaches are too coarse-grained and fail to fully exploit the combined computational and storage capabilities of GPUs. In this paper, we introduce eLLM, a novel LLM serving system designed to achieve high throughput and low latency through fine-grained KV caching. The core innovation lies in adaptively storing partial tokens with KV caches while dynamically recomputing non-cached tokens in parallel with decoding, thereby balancing memory usage and computational efficiency. This new mechanism enables dual-level optimizations: At the request level, eLLM employs token-wise caching to adaptively adjust batch sizes and uncached token ratios in real time. At the layer level, eLLM leverages communication-computation overlap and kernel fusion for resource-complementary operations to further enhance throughput and reduce latency. Experiments demonstrate that eLLM achieves 3.03× higher throughput while satisfying strict per-output-token latency SLOs. It also reduces first-token latency by 2.63× compared to state-of-the-art systems.

Publication
In Proceedings of European Conference on Computer Systems (Eurosys) 2026
Wenyan Chen
Wenyan Chen
2021 - 2025 PhD student

My research interests include resource management and task scheduling in GPU clusters.

Chengzhi Lu
Chengzhi Lu
2022 - 2025 Postdoc

My research interests include resource management and task scheduling in the large scale cluster.

Huanle Xu
Huanle Xu
2021 - Current

I am currently an assistant professor from the Department of Computer and Information Scicence, Univeristy of Macau.

Chengzhong Xu
Chengzhong Xu
2019 - Current

I am currently a Chair Professor in the Department of Computer and Information Science and serve as the Dean of the Faculty of Science and Technology at the University of Macau.