Skip to Main content Skip to Navigation
Journal articles

Learning the distribution with largest mean: two bandit frameworks

Emilie Kaufmann 1 Aurélien Garivier 2
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189
Abstract : Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.
Document type :
Journal articles
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01449822
Contributor : Emilie Kaufmann Connect in order to contact the contributor
Submitted on : Monday, November 6, 2017 - 5:02:22 PM
Last modification on : Wednesday, October 27, 2021 - 1:11:01 PM

Files

ESAIM17KG.pdf
Files produced by the author(s)

Identifiers

`

Citation

Emilie Kaufmann, Aurélien Garivier. Learning the distribution with largest mean: two bandit frameworks. ESAIM: Proceedings and Surveys, EDP Sciences, 2017, 60, pp.114 - 131. ⟨10.1051/proc/201760114⟩. ⟨hal-01449822v3⟩

Share

Metrics

Record views

396

Files downloads

1070