Cloud Configuration Optimization for Recurring Batch-Processing Applications

Abstract

Recognizing the diversity of Big Data analytic jobs, cloud providers offer a wide range of VM instance types or even clusters to cater for different use cases. The choice of cloud configurations can have a significant impact on the response time and running cost of batch-processing applications, which may need to be re-run regularly with cloud-scale resources. However, identifying the best cloud configuration with a low search cost is quite challenging due to i) the large and high-dimensional configuration space, ii) the time-varying cloud service cost (e.g., AWS Spot instances), and iii) job response time variation even given the same configuration. To tackle these challenges, we design and implement Accordia, a system that enables Adaptive Cloud Configuration Optimization for Recurring Data-Intensive Applications. By leveraging recent algorithmic advances in Gaussian Process UCB techniques, Accordia can unearth the cost-optimal configuration with a deadline constraint (i.e., maximum tolerated running time) under the time-varying cloud service cost. More importantly, Accordia manages to achieve a theoretical performance guarantee, sub-linearly increasing dynamic regret of the job completion cost. Using extensive trace-driven simulations and empirical measurements of our Kubernetes-based implementation, we demonstrate that Accordia can identify a near-cost-optimal configuration (i.e., within 10% of the optimum) after fewer than 20 runs from over 7000 candidate choices, which translates to a 2X-speedup and up to 17.9% cost-savings, when comparing to the state-of-the-art approach, CherryPick.

Publication
IEEE Transactions on Parallel and Distributed Systems (TPDS) 34(5)
Huanle Xu
Huanle Xu
2021.01 - Current

I am currently an assistant professor from the Department of Computer and Information Scicence, Univeristy of Macau.