Learning the distribution with largest mean: two bandit frameworks

Emilie Kaufmann; Aurélien Garivier

doi:10.1051/proc/201760114

Article Dans Une Revue ESAIM: Proceedings and Surveys Année : 2017

Learning the distribution with largest mean: two bandit frameworks

(1) , (2)

1
2

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Sequential Learning

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Résumé

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

Le modèle stochastique dit de bandit à plusieurs bras soulève ces dernières années un grand intérêt dans la communauté de l'apprentissage automatique, du fait notamment de ses applications à l'optimisation de contenu sur le web. Cet article présente deux problèmes d'apprentissage séquentiel dans le cadre d'un modèle de bandit qui peuvent être formulés comme la découverte de la distribution ayant la moyenne la plus élevée dans un ensemble de distributions, avec certaines contraintes sur le processus d'apprentissage. Pour ces deux objectifs (minimisation du regret d'une part et identification du meilleur bras d'autre part), nous présentons des algorithmes optimaux, en un sens asymptotique. Nous comparons les stratégies d’échantillonnage employées par ces deux types d'algorithmes ainsi que les quantités caractérisant la complexité de chacun des problèmes.

Mots clés

bandit ucb regret minimization best arm identification

Domaines

Statistiques [math.ST]

Fichier principal

ESAIM17KG.pdf (443.77 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01449822

Soumis le : lundi 6 novembre 2017-17:02:22

Dernière modification le : mardi 16 avril 2024-11:15:13

Dates et versions

hal-01449822 , version 1 (30-01-2017)

hal-01449822 , version 2 (23-03-2017)

hal-01449822 , version 3 (06-11-2017)

Identifiants

HAL Id : hal-01449822 , version 3
ARXIV : 1702.00001
DOI : 10.1051/proc/201760114

Citer

Emilie Kaufmann, Aurélien Garivier. Learning the distribution with largest mean: two bandit frameworks. ESAIM: Proceedings and Surveys, 2017, 60, pp.114 - 131. ⟨10.1051/proc/201760114⟩. ⟨hal-01449822v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INRIA INSA-TOULOUSE IMT UT1-CAPITOLE CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE INSA-GROUPE INSA-TOULOUSE-GEI ANR UNIV-UT3 UT3-TOULOUSEINP

484 Consultations

913 Téléchargements

Learning the distribution with largest mean: two bandit frameworks

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager