1CentraleSupélec (3, rue Joliot Curie,
Plateau de Moulon,
91192 GIF-SUR-YVETTE Cedex - France)
2Orange Gardens (44 Avenue de la République, 92320 Châtillon, - France)
Abstract : We consider combinatorial semi-bandits over a set of arms X ⊂ {0, 1} d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret bound R(T) = O d(ln m)² (ln T) ∆ min , but it has computational complexity O(|X|) which is typically exponential in d, and cannot be used in large dimensions. We propose the first algorithm which is both computationally and statistically efficient for this problem with regret R(T) = O d(ln m) 2 (ln T) ∆ min and computational complexity O(T poly(d)). Our approach involves carefully designing an approximate version of ESCB with the same regret guarantees, showing that this approximate algorithm can be implemented in time O(T poly(d)) by repeatedly maximizing a linear function over X subject to a linear budget constraint, and showing how to solve this maximization problems efficiently.
https://hal.archives-ouvertes.fr/hal-03162127 Contributor : Thibaut CuvelierConnect in order to contact the contributor Submitted on : Monday, March 8, 2021 - 12:05:10 PM Last modification on : Thursday, April 15, 2021 - 3:06:31 AM Long-term archiving on: : Wednesday, June 9, 2021 - 6:54:12 PM
Thibaut Cuvelier, Richard Combes, Eric Gourdin. Statistically Efficient, Polynomial Time Algorithms for Combinatorial Semi Bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems , ACM, 2021, 5 (9), pp.1-31. ⟨10.1145/3447387⟩. ⟨hal-03162127⟩