Maximin Action Identification: A New Bandit Framework for Games

Abstract : We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample random outcomes of sequentially chosen pairs of actions. We propose two strategies for the fixed-confidence setting: Maximin-LUCB, based on lower-and upper-confidence bounds; and Maximin-Racing, which operates by successively eliminating the sub-optimal actions. We discuss the sample complexity of both methods and compare their performance empirically. We sketch a lower bound analysis, and possible connections to an optimal algorithm.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01273842
Contributor : Emilie Kaufmann <>
Submitted on : Monday, November 21, 2016 - 10:53:09 AM
Last modification on : Monday, April 29, 2019 - 3:58:19 PM
Long-term archiving on : Tuesday, March 21, 2017 - 12:14:16 AM

Files

garivier16b.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01273842, version 2
  • ARXIV : 1602.04676

Citation

Aurélien Garivier, Emilie Kaufmann, Wouter Koolen. Maximin Action Identification: A New Bandit Framework for Games. 29th Annual Conference on Learning Theory (COLT), Jun 2016, New-York, United States. ⟨hal-01273842v2⟩

Share

Metrics

Record views

442

Files downloads

163