SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing

Chengzhi Lu, Huanle Xu, Yudan Li, Wenyan Chen, Kejiang Ye, Chengzhong Xu

November 2024

Abstract

The deployment of ML serving applications, featuring multiple inference functions on serverless platforms, has gained substantial popularity, leading to numerous developments of new systems. However, these systems often focus on optimizing resource provisioning and cold start management separately, ultimately resulting in higher monetary costs. This paper introduces SMIless, a highly efficient serverless system tailored for serving DAG-based ML inference in heterogeneous environments. SMIless effectively co-optimizes resource configuration and cold-start management in the context of dynamic invocations. This is achieved by seamlessly integrating adaptive pre-warming windows, striking an effective balance between performance and cost. We have implemented SMIless on top of OpenFaaS and conducted extensive evaluations using real-world ML serving applications. The experimental results demonstrate that SMIless can achieve up to a 5.73 × reduction in the overall costs while meeting the SLA requirements for all user requests, surpassing the performance of state-of-the-art solutions.

Type

Conference paper

Publication

In International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2024

SMIless: Serving DAG-based Inference with Dynamic Invocations under Serverless Computing

Abstract

Chengzhi Lu

2025 - Current

Huanle Xu

2021 - Current

Wenyan Chen

2021 - Current

Chengzhong Xu

2019 - Current