On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

Emilie Kaufmann; Olivier Cappé; Aurélien Garivier

Article Dans Une Revue Journal of Machine Learning Research Année : 2016

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

(1, 2) , (2) , (3)

1
2
3

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Sequential Learning

Laboratoire Traitement et Communication de l'Information

Olivier Cappé

Fonction : Auteur
PersonId : 1534
IdHAL : olivier-cappe
ORCID : 0000-0001-7415-8669
IdRef : 057106878

Laboratoire Traitement et Communication de l'Information

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Résumé

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is to contribute to a better understanding of the performance in terms of identifying the m best arms. We introduce generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, we provide the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m is larger than 1 under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matching algorithms for Gaussian and Bernoulli bandit models. These results show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observed when testing fully specified alternatives. In addition, we also provide improved sequential stopping rules that have guaranteed error probabilities and shorter average running times. The proofs rely on two technical results that are of independent interest : a deviation lemma for self-normalized sums (Lemma 19) and a novel change of measure inequality for bandit models (Lemma 1).

Mots clés

multi-armed bandit best arm identification pure exploration information-theoretic divergences sequential testing

Domaines

Machine Learning [stat.ML]

Fichier principal

kaufman15a.pdf (630.88 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01024894

Soumis le : jeudi 10 novembre 2016-20:06:13

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : mardi 21 mars 2017-08:25:45

Dates et versions

hal-01024894 , version 1 (16-07-2014)

hal-01024894 , version 2 (10-11-2016)

Identifiants

HAL Id : hal-01024894 , version 2
ARXIV : 1407.4443

Citer

Emilie Kaufmann, Olivier Cappé, Aurélien Garivier. On the Complexity of Best Arm Identification in Multi-Armed Bandit Models. Journal of Machine Learning Research, 2016, 17, pp.1-42. ⟨hal-01024894v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-TLSE2 CNRS INRIA INSA-TOULOUSE PARISTECH IMT UT1-CAPITOLE CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE INSA-GROUPE LTCI INSA-TOULOUSE-GEI ANR UNIV-UT3 UT3-TOULOUSEINP

629 Consultations

299 Téléchargements

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager