Maximin Action Identification: A New Bandit Framework for Games

Aurélien Garivier; Emilie Kaufmann; Wouter M. Koolen

Communication Dans Un Congrès Année : 2016

Maximin Action Identification: A New Bandit Framework for Games

(1) , (2, 3, 4) , (5)

1
2
3
4
5

Aurélien Garivier

Fonction : Auteur
PersonId : 4986
IdHAL : aurelien-garivier
ORCID : 0000-0002-4906-9573
IdRef : 111902495

Institut de Mathématiques de Toulouse UMR5219

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Centre National de la Recherche Scientifique

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Sequential Learning

Wouter M. Koolen

Fonction : Auteur
PersonId : 976655

Centrum Wiskunde & Informatica

Résumé

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.

Mots clés

racing LUCB multi-armed bandit problems games best-arm identification

Domaines

Statistiques [math.ST] Machine Learning [stat.ML] Informatique et théorie des jeux [cs.GT]

Fichier principal

garivier16b.pdf (253.94 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Emilie Kaufmann : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01273842

Soumis le : lundi 21 novembre 2016-10:53:09

Dernière modification le : vendredi 19 avril 2024-11:21:52

Archivage à long terme le : mardi 21 mars 2017-00:14:16

Dates et versions

hal-01273842 , version 1 (14-02-2016)

hal-01273842 , version 2 (21-11-2016)

Identifiants

HAL Id : hal-01273842 , version 2
ARXIV : 1602.04676

Citer

Aurélien Garivier, Emilie Kaufmann, Wouter M. Koolen. Maximin Action Identification: A New Bandit Framework for Games. 29th Annual Conference on Learning Theory (COLT), Jun 2016, New-York, United States. ⟨hal-01273842v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS INRIA INSA-TOULOUSE IMT UT1-CAPITOLE CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE INSA-GROUPE INSA-TOULOUSE-GEI ANR UNIV-UT3 UT3-TOULOUSEINP

1658 Consultations

182 Téléchargements

Maximin Action Identification: A New Bandit Framework for Games

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager