Paper-Conference

SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing

The deployment of ML serving applications, featuring multiple inference functions on serverless platforms, has gained substantial …

Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, Chengzhong Xu

Derm: SLA-aware Resource Management for Highly Dynamic Microservices

In this paper, we present Derm, a new resource management system designed for microservice applications with highly dynamic graphs. Our …

Liao Chen, Shutian Luo, Chenyu Lin, Zizhao Mo, Huanle Xu, Kejiang Ye, Chengzhong Xu

Optimizing Dynamic Data Center Provisioning through Speed Scaling: A Primal-Dual Perspective

A significant proportion of energy consumed in modern data centers and clouds is dedicated to provisioning idle servers for maintaining …

Xiaosong Chen, Huanle Xu, Chengzhong Xu

Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters

Modern GPU clusters inherently exhibit heterogeneity, encompassing various aspects such as computation and communication. This …

Zizhao Mo, Huanle Xu, Chengzhong Xu

Interference-aware Multiplexing for Deep Learning in GPU Clusters A Middleware Approach

A common strategy for improving efficiency in training deep learning entails multiplexing tasks on a single GPU. To mitigate the …

Wenyan Chen, Zizhao Mo, Huanle Xu, Kejiang Ye, Chengzhong Xu

PERT-GNN: Latency Prediction for Microservice-based Cloud-Native Applications via Graph Neural Networks

Cloud-native applications using microservice architectures are rapidly replacing traditional monolithic applications. To meet …

Da Sun Handason Tam, Yang Liu,, Huanle Xu, Siyue Xie, Wing Cheong Lau

Understanding and Optimizing Workloads for Unified Resource Management in Large Cloud Platforms

To fully utilize computing resources, cloud providers such as Google and Alibaba choose to co-locate online services with batch …

Chengzhi Lu, Huanle Xu, Keying Ye, Guoyao Xu, Liping Zhang, Guodong Yang, Chengzhong Xu

Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees

A common approach to improving resource utilization in data centers is to adaptively provision resources based on the actual workload. …

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Jian He, Guodong Yang, Chengzhong Xu

The Power of Prediction Microservice Auto Scaling via Workload Learning

When deploying microservices in production clusters, it is critical to automatically scale containers to improve cluster utilization …

Shutian Luo, Huanle Xu, Kejiang Ye, Guoyao Xu, Liping Zhang, Guodong Yang, Chengzhong Xu

Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters

To mitigate the straggler effect, today’s systems and computing frameworks have adopted redundancy to launch extra copies for …

Huanle Xu, Yang Liu, Wing Cheong Lau