Skip to Main content Skip to Navigation
Journal articles

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Abstract : We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-known properties of Kullback-Leibler divergences. These bounds show in particular that in an initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they are deprived of all unnecessary complications.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01276324
Contributor : Gilles Stoltz <>
Submitted on : Monday, October 8, 2018 - 10:02:31 PM
Last modification on : Thursday, March 5, 2020 - 6:50:19 PM
Long-term archiving on: : Wednesday, January 9, 2019 - 4:01:18 PM

Files

Bandit-lower-bounds-MOR-v3.pdf
Files produced by the author(s)

Identifiers

Citation

Aurélien Garivier, Pierre Ménard, Gilles Stoltz. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. Mathematics of Operations Research, INFORMS, 2019, 44 (2), pp.377-399. ⟨10.1287/moor.2017.0928⟩. ⟨hal-01276324v3⟩

Share

Metrics

Record views

341

Files downloads

864