Combinatorial Multi-Armed Bandits with Fairness Constraints: An Online Convex Optimization Perspective

Abstract

The problem of multi-armed bandit (MAB) with fairness constraints has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed number of pull rounds, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing various online selection solutions for MAB, however, when incorporating such fairness constraints, they fail to achieve a sublinear regret bound. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we design a new selection algorithm that solves MAB problems from an online convex optimization perspective. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound of $\mathcal{O}(\sqrt{T \ln{T}})$ with high probability guarantees in $T$ selection rounds. We also extend our framework to include more general knapsack constraints. Finally, we assess the performance of our algorithm through extensive simulations and real dataset applications, demonstrating its significant advantages over baseline schemes.

Publication
Journal of Artificial Intelligence Research, 2025
Xiaosong Chen
Xiaosong Chen
2022 - Current

My research interest includes online algorithm and its application to cloud-edge systems.

Hanqin Zhuang
Hanqin Zhuang
2023 - Current
Huanle Xu
Huanle Xu
2021.01 - Current

I am currently an assistant professor from the Department of Computer and Information Scicence, Univeristy of Macau.