Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Abstract : We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.
Type de document :
Pré-publication, Document de travail
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger
Contributeur : Emilie Kaufmann <>
Soumis le : dimanche 5 novembre 2017 - 18:40:56
Dernière modification le : vendredi 23 février 2018 - 17:12:11
Document(s) archivé(s) le : mardi 6 février 2018 - 12:45:26



  • HAL Id : hal-01338733, version 2
  • ARXIV : 1606.09388


Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz. Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits. 2017. 〈hal-01338733v2〉



Consultations de la notice


Téléchargements de fichiers