Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Emilie Kaufmann; Wouter Koolen; Aurélien Garivier

Communication Dans Un Congrès Année : 2018

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

(1, 2, 3) , (4) , (5, 6)

1
2
3
4
5
6

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Sequential Learning

Centre National de la Recherche Scientifique

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Wouter Koolen

Fonction : Auteur

Centrum Wiskunde & Informatica

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Unité de Mathématiques Pures et Appliquées

Modèles de calcul, Complexité, Combinatoire

Résumé

Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.

Mots clés

multi-armed bandits sequential adaptive hypothesis testing

Domaines

Machine Learning [stat.ML]

Fichier principal

arXiv.pdf (191.65 Ko)

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01804581

Soumis le : jeudi 31 mai 2018-18:42:17

Dernière modification le : lundi 11 mars 2024-14:28:40

Archivage à long terme le : samedi 1 septembre 2018-15:01:48

Dates et versions

hal-01804581 , version 1 (31-05-2018)

Identifiants

HAL Id : hal-01804581 , version 1
ARXIV : 1806.00973

Citer

Emilie Kaufmann, Wouter Koolen, Aurélien Garivier. Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling. Advances in Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada. ⟨hal-01804581⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 IMT CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE UDL ANR

224 Consultations

68 Téléchargements

Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager