Maximin Action Identification: A New Bandit Framework for Games - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Maximin Action Identification: A New Bandit Framework for Games

Résumé

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.
Fichier principal
Vignette du fichier
garivier16b.pdf (253.94 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01273842 , version 1 (14-02-2016)
hal-01273842 , version 2 (21-11-2016)

Identifiants

Citer

Aurélien Garivier, Emilie Kaufmann, Wouter M. Koolen. Maximin Action Identification: A New Bandit Framework for Games. 29th Annual Conference on Learning Theory (COLT), Jun 2016, New-York, United States. ⟨hal-01273842v2⟩
1653 Consultations
180 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More