Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Alexander R. Luedtke; Emilie Kaufmann; Antoine Chambaz

Pré-Publication, Document De Travail Année : 2017

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

(1) , (2) , (3, 4)

1
2
3
4

Alexander R. Luedtke

Fonction : Auteur

Fred Hutchinson Cancer Research Center [Seattle]

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Sequential Learning

Antoine Chambaz

Fonction : Auteur

Mathématiques Appliquées Paris 5

Modélisation aléatoire de Paris X

Résumé

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an asymptotic regret lower bound for any uniformly efficient algorithm in our setting. We then study a variant of Thompson sampling for Bernoulli rewards and a variant of KL-UCB for both single-parameter exponential families and bounded, finitely supported rewards. We show these algorithms are asymptotically optimal, both in rate and leading problem-dependent constants, including in the thick margin setting where multiple arms fall on the decision boundary.

Domaines

Machine Learning [stat.ML]

Fichier principal

LKC17arxiv.pdf (518 Ko)

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01338733

Soumis le : dimanche 5 novembre 2017-18:40:56

Dernière modification le : jeudi 11 avril 2024-13:16:13

Archivage à long terme le : mardi 6 février 2018-12:45:26

Dates et versions

hal-01338733 , version 1 (29-06-2016)

hal-01338733 , version 2 (05-11-2017)

hal-01338733 , version 3 (03-09-2019)

Identifiants

HAL Id : hal-01338733 , version 2
ARXIV : 1606.09388

Citer

Alexander R. Luedtke, Emilie Kaufmann, Antoine Chambaz. Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits. 2017. ⟨hal-01338733v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

MODALX

472 Consultations

370 Téléchargements

Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager