On Explore-Then-Commit Strategies

Aurélien Garivier; Emilie Kaufmann; Tor Lattimore

Communication Dans Un Congrès Année : 2016

On Explore-Then-Commit Strategies

(1) , (2, 3, 4) , (5)

1
2
3
4
5

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Centre National de la Recherche Scientifique

Tor Lattimore

Fonction : Auteur
PersonId : 982849

University of Alberta

Résumé

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main message, we also refine existing deviation inequalities, which allow us to design fully sequential strategies with finite-time regret guarantees that are (a) asymptotically optimal as the horizon grows and (b) order-optimal in the minimax sense. Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case.

Mots clés

bandit models sequential statistics ucb

Domaines

Statistiques [math.ST] Apprentissage [cs.LG]

Fichier principal

nips_final.pdf (753.95 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01322906

Soumis le : lundi 14 novembre 2016-13:19:04

Dernière modification le : vendredi 19 avril 2024-11:20:39

Archivage à long terme le : mardi 21 mars 2017-00:39:18

Dates et versions

hal-01322906 , version 1 (28-05-2016)

hal-01322906 , version 2 (14-11-2016)

Identifiants

HAL Id : hal-01322906 , version 2
ARXIV : 1605.08988

Citer

Aurélien Garivier, Emilie Kaufmann, Tor Lattimore. On Explore-Then-Commit Strategies. NIPS, Dec 2016, Barcelona, Spain. ⟨hal-01322906v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INRIA INSA-TOULOUSE IMT UT1-CAPITOLE CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE INSA-GROUPE INSA-TOULOUSE-GEI ANR UNIV-UT3 UT3-TOULOUSEINP

370 Consultations

241 Téléchargements

On Explore-Then-Commit Strategies

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager