Pure Exploration for Multi-Armed Bandit Problems - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2009

Pure Exploration for Multi-Armed Bandit Problems

Résumé

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed in terms of their simple regrets, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between simple and cumulative regrets. The main result is that the required exploration--exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret. We then refine this statement.
Fichier principal
Vignette du fichier
PureExplo.pdf (265.14 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00257454 , version 1 (19-02-2008)
hal-00257454 , version 2 (12-06-2008)
hal-00257454 , version 3 (16-06-2008)
hal-00257454 , version 4 (19-02-2009)
hal-00257454 , version 5 (26-01-2010)
hal-00257454 , version 6 (08-06-2010)

Identifiants

Citer

Sébastien Bubeck, Rémi Munos, Gilles Stoltz. Pure Exploration for Multi-Armed Bandit Problems. 2009. ⟨hal-00257454v4⟩
850 Consultations
961 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More