Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters

Wenyan Chen, Chengzhi Lu, Huanle Xu, Kejiang Ye, Chengzhong Xu

March 2025

Abstract

In this paper, we introduce Mudi, a new SLO-aware system designed to optimize the utilization of GPU resources within large-scale clusters. Mudi achieves this by efficiently multiplexing DL inference services with training tasks through spatial sharing. The fundamental concept behind Mudi involves profiling the latency of inference services using a piece-wise linear function that accurately captures resource interference. Leveraging this quantification of interference, Mudi designs a scalable cluster-wide co-location policy, determining the optimal multiplexing of training tasks and inference services to maximize resource efficiency. Furthermore, Mudi incorporates adaptive batching and resource scaling mechanisms to rapidly adapt to the dynamic workloads. Experimental results demonstrate that Mudi improves 42% of GPU resource utilization and achieves up to 2.27x higher training efficiency while satisfying inference SLOs, as compared to state-of-the-art multiplexing methods.

Type

Conference paper

Publication

In The European Conference on Computer Systems (Eurosys) 2025

Multiplexing Dynamic Deep Learning Workloads with SLO-awareness in GPU Clusters

Abstract

Wenyan Chen

2021 - Current

Chengzhi Lu

2025 - Current

Huanle Xu

2021 - Current

Chengzhong Xu

2019 - Current