Infinitely many-armed bandits

Yizao Wang 1 Jean-Yves Audibert 2, 3, 4 Rémi Munos 5
2 IMAGINE [Marne-la-Vallée]
LIGM - Laboratoire d'Informatique Gaspard-Monge, CSTB - Centre Scientifique et Technique du Bâtiment, ENPC - École des Ponts ParisTech
4 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, ENS Paris - École normale supérieure - Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
5 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [9 references]  Display  Hide  Download
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 3:19:36 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Document(s) archivé(s) le : Thursday, September 5, 2013 - 4:23:06 AM


Files produced by the author(s)


  • HAL Id : hal-00830178, version 1


Yizao Wang, Jean-Yves Audibert, Rémi Munos. Infinitely many-armed bandits. Advances in Neural Information Processing Systems, 2008, Canada. ⟨hal-00830178⟩



Record views


Files downloads