Explore First, Exploit Next: The True Shape of Regret in Bandit Problems - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Résumé

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the arguments used and they are deprived of all unnecessary complications.
Fichier principal
Vignette du fichier
Bandit-lower-bounds--HAL.pdf (483.9 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01276324 , version 1 (19-02-2016)
hal-01276324 , version 2 (16-06-2016)
hal-01276324 , version 3 (08-10-2018)

Identifiants

Citer

Aurélien Garivier, Pierre Ménard, Gilles Stoltz. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. 2016. ⟨hal-01276324v1⟩
794 Consultations
1141 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More