Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier; Pierre Ménard; Gilles Stoltz

Pré-Publication, Document De Travail Année : 2016

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

(1) , (1) , (2)

1
2

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Pierre Ménard

Fonction : Auteur
PersonId : 1022182

Institut de Mathématiques de Toulouse UMR5219

Gilles Stoltz

Fonction : Auteur
PersonId : 738739
IdHAL : gilles-stoltz
ORCID : 0000-0003-1240-1007
IdRef : 091575419

Groupement de Recherche et d'Etudes en Gestion à HEC

Résumé

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the arguments used and they are deprived of all unnecessary complications.

Mots clés

information-theoretic proof techniques multi-armed bandits cumulative regret non-asymptotic lower bounds

Domaines

Statistiques [math.ST] Apprentissage [cs.LG]

Fichier principal

Bandit-lower-bounds--HAL.pdf (483.9 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gilles Stoltz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01276324

Soumis le : vendredi 19 février 2016-11:09:09

Dernière modification le : lundi 20 novembre 2023-11:44:19

Archivage à long terme le : vendredi 20 mai 2016-10:31:43

Dates et versions

hal-01276324 , version 1 (19-02-2016)

hal-01276324 , version 2 (16-06-2016)

hal-01276324 , version 3 (08-10-2018)

Identifiants

HAL Id : hal-01276324 , version 1
ARXIV : 1602.07182

Citer

Aurélien Garivier, Pierre Ménard, Gilles Stoltz. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. 2016. ⟨hal-01276324v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

794 Consultations

1141 Téléchargements

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager