Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Abstract : We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.
Type de document :
Pré-publication, Document de travail
2017
Liste complète des métadonnées

Littérature citée [37 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01338733
Contributeur : Emilie Kaufmann <>
Soumis le : dimanche 5 novembre 2017 - 18:40:56
Dernière modification le : vendredi 17 novembre 2017 - 08:50:20

Fichiers

Identifiants

  • HAL Id : hal-01338733, version 2
  • ARXIV : 1606.09388

Collections

Citation

Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz. Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits. 2017. 〈hal-01338733v2〉

Partager

Métriques

Consultations de la notice

40

Téléchargements de fichiers

23