Best Arm Identification in Multi-Armed Bandits

Jean-Yves Audibert; Sébastien Bubeck

Communication Dans Un Congrès Année : 2010

Best Arm Identification in Multi-Armed Bandits

(1, 2, 3) , (4)

1
2
3
4

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

imagine [Marne-la-Vallée]

Models of visual object recognition and scene understanding

Laboratoire d'Informatique Gaspard-Monge

Sébastien Bubeck

Fonction : Auteur
PersonId : 844095

Sequential Learning

Résumé

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regret decreases exponentially at a rate which is, up to a logarithmic factor, the best possible. However, while the UCB policy needs the tuning of a parameter depending on the unobservable hardness of the task, the successive rejects policy benefits from being parameter-free, and also independent of the scaling of the rewards. As a by-product of our analysis, we show that identifying the best arm (when it is unique) requires a number of samples of order (up to a log(K) factor) Σ i 1/Δ2i, where the sum is on the suboptimal arms andΔi represents the difference between the mean reward of the best arm and the one of arm i. This generalizes the well-known fact that one needs of order of 1/Δ2 samples to differentiate the means of two distributions with gap Δ.

Domaines

Autres [stat.ML] Apprentissage [cs.LG]

Fichier principal

COLT10.pdf (166.83 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jean-Yves Audibert : Connectez-vous pour contacter le contributeur

https://enpc.hal.science/hal-00654404

Soumis le : mercredi 21 décembre 2011-18:30:59

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : jeudi 22 mars 2012-02:30:55

Dates et versions

hal-00654404 , version 1 (21-12-2011)

Identifiants

HAL Id : hal-00654404 , version 1

Citer

Jean-Yves Audibert, Sébastien Bubeck. Best Arm Identification in Multi-Armed Bandits. COLT - 23th Conference on Learning Theory - 2010, Jun 2010, Haifa, Israel. 13 p. ⟨hal-00654404⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC UNIV-LILLE3 CNRS INRIA UNIV-MLV LIGM_A3SI PARISTECH LAGIS LIGM IMAGINE INRIA2 PSL ESIEE-PARIS UNIV-EIFFEL JSE2024

2918 Consultations

3532 Téléchargements

Best Arm Identification in Multi-Armed Bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager