Minimax policies for adversarial and stochastic bandits

Jean-Yves Audibert 1, 2 Sébastien Bubeck 3
2 IMAGINE [Marne-la-Vallée]
LIGM - Laboratoire d'Informatique Gaspard-Monge, CSTB - Centre Scientifique et Technique du Bâtiment, ENPC - École des Ponts ParisTech
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We fill in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem. Concretely, we remove an extraneous loga- rithmic factor in the previously known upper bound and propose a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. We also consider the stochastic case, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002) achieves the distribution-free optimal rate while still having a distribution-dependent rate log- arithmic in the number of plays.
Document type :
Conference papers
Complete list of metadatas

https://hal-enpc.archives-ouvertes.fr/hal-00834882
Contributor : Pascal Monasse <>
Submitted on : Monday, June 17, 2013 - 3:19:50 PM
Last modification on : Thursday, September 12, 2019 - 4:08:56 PM
Long-term archiving on : Wednesday, September 18, 2013 - 4:15:37 AM

File

COLT09a.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00834882, version 1

Citation

Jean-Yves Audibert, Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. COLT, Jun 2009, Montreal, Canada. pp.217-226. ⟨hal-00834882⟩

Share

Metrics

Record views

613

Files downloads

712