Multi-Bandit Best Arm Identification

Victor Gabillon; Mohammad Ghavamzadeh; Alessandro Lazaric; Sébastien Bubeck

Rapport Année : 2011

Multi-Bandit Best Arm Identification

(1) , (1) , (1) , (2)

1
2

Victor Gabillon

Fonction : Auteur
PersonId : 900485

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Sébastien Bubeck

Fonction : Auteur
PersonId : 901688

Department of Operations Research and Financial Engineering

Résumé

We study the problem of identifying the best arm in each of the bandits in a multi-bandit multi-armed setting. We first propose an algorithm called Gap-based Exploration (GapE) that focuses on the arms whose mean is close to the mean of the best arm in the same bandit (i.e., small gap). We then introduce an algorithm, called GapE-V, which takes into account the variance of the arms in addition to their gap. We prove an upper-bound on the probability of error for both algorithms. Since GapE and GapE-V need to tune an exploration parameter that depends on the complexity of the problem, which is often unknown in advance, we also introduce variation of these algorithms that estimates this complexity online. Finally, we evaluate the performance of these algorithms and compare them to other allocation strategies on a number of synthetic problems.

Domaines

Informatique

Fichier principal

multi-bandit_techreport.pdf (279.09 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Victor Gabillon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00632523

Soumis le : samedi 19 novembre 2011-15:11:40

Dernière modification le : vendredi 24 mars 2023-14:52:55

Archivage à long terme le : lundi 20 février 2012-02:20:57

Dates et versions

hal-00632523 , version 1 (14-10-2011)

hal-00632523 , version 2 (25-10-2011)

hal-00632523 , version 3 (19-11-2011)

Identifiants

HAL Id : hal-00632523 , version 3

Citer

Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, Sébastien Bubeck. Multi-Bandit Best Arm Identification. 2011. ⟨hal-00632523v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS GRID5000 INRIA2 LARA SILECS

297 Consultations

163 Téléchargements

Multi-Bandit Best Arm Identification

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager