Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback

Abstract : We study a variant of the multi-armed bandit problem with multiple plays in which the user wishes to sample the m out of k arms with the highest expected rewards, but at any given time can only sample l ≤ m arms. When l = m, Thompson sampling was recently shown to be asymptotically efficient. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our new setting where may be less than m. We then establish the asymptotic optimality of Thompson sampling for Bernoulli rewards, where our proof technique differs from earlier methods even when l = m. We also prove the asymptotic optimality of an algorithm based on upper confidence bounds, KL-CUCB, for single-parameter exponential families and bounded, finitely supported rewards, a result which is new for all values of l.
Type de document :
Pré-publication, Document de travail
2016
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01338733
Contributeur : Emilie Kaufmann <>
Soumis le : mercredi 29 juin 2016 - 16:20:07
Dernière modification le : mardi 10 octobre 2017 - 13:47:41
Document(s) archivé(s) le : vendredi 30 septembre 2016 - 11:41:42

Fichiers

combinatorial_feedback.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01338733, version 1
  • ARXIV : 1606.09388

Collections

Citation

Alexander Luedtke, Emilie Kaufmann, Antoine Chambaz. Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback. 2016. 〈hal-01338733〉

Partager

Métriques

Consultations de
la notice

169

Téléchargements du document

70