Skip to Main content Skip to Navigation
Conference papers

Minimax policies for adversarial and stochastic bandits

Jean-Yves Audibert 1, 2 Sébastien Bubeck 3
2 imagine [Marne-la-Vallée]
LIGM - Laboratoire d'Informatique Gaspard-Monge, CSTB - Centre Scientifique et Technique du Bâtiment, ENPC - École des Ponts ParisTech
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem. Concretely, we remove an extraneous loga- rithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate log- arithmic in the number of plays.
Document type :
Conference papers
Complete list of metadata
Contributor : Pascal Monasse Connect in order to contact the contributor
Submitted on : Monday, June 17, 2013 - 3:19:50 PM
Last modification on : Tuesday, October 19, 2021 - 11:26:18 AM
Long-term archiving on: : Wednesday, September 18, 2013 - 4:15:37 AM


Files produced by the author(s)


  • HAL Id : hal-00834882, version 1


Jean-Yves Audibert, Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. COLT, Jun 2009, Montreal, Canada. pp.217-226. ⟨hal-00834882⟩



Record views


Files downloads