Search

Kejiang Ye

High Throughput and Low Latency LLM Serving via Adaptive KV Caching
Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters
SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing
Derm: SLA-aware Resource Management for Highly Dynamic Microservices
Optimizing Resource Management for Shared Microservices: A Scalable System Design
Interference-aware Multiplexing for Deep Learning in GPU Clusters A Middleware Approach
Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees
The Power of Prediction Microservice Auto Scaling via Workload Learning
An In-Depth Study of Microservice Call Graph and Runtime Performance
Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.