Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Abstract : We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download
Contributor : Emilie Kaufmann <>
Submitted on : Sunday, November 5, 2017 - 6:40:56 PM
Last modification on : Thursday, April 11, 2019 - 4:02:09 PM
Long-term archiving on : Tuesday, February 6, 2018 - 12:45:26 PM


  • HAL Id : hal-01338733, version 2
  • ARXIV : 1606.09388


Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz. Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits. 2017. ⟨hal-01338733v2⟩



Record views


Files downloads