SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing

Abstract

The deployment of ML serving applications, featuring multiple inference functions on serverless platforms, has gained substantial popularity, leading to numerous developments of new systems. However, these systems often focus on optimizing resource provisioning and cold start management separately, ultimately resulting in higher monetary costs. This paper introduces SMIless, a highly efficient serverless system tailored for serving DAG-based ML inference in heterogeneous environments. SMIless effectively co-optimizes resource configuration and cold-start management in the context of dynamic invocations. This is achieved by seamlessly integrating adaptive pre-warming windows, striking an effective balance between performance and cost. We have implemented SMIless on top of OpenFaaS and conducted extensive evaluations using real-world ML serving applications. The experimental results demonstrate that SMIless can achieve up to a 5.73 × reduction in the overall costs while meeting the SLA requirements for all user requests, surpassing the performance of state-of-the-art solutions.

Publication
In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2024
Chengzhi Lu
Chengzhi Lu
2025 - Current

My research interests include resource management and task scheduling in the large scale cluster.

Huanle Xu
Huanle Xu
2021.01 - Current

I am currently an assistant professor from the Department of Computer and Information Scicence, Univeristy of Macau.

Wenyan Chen
Wenyan Chen
2021 - Current

My research interests include resource management and task scheduling in GPU clusters.