Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Emilie Kaufmann 1 Nathaniel Korda 2 Rémi Munos 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00830033
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 12:01:22 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM

Links full text

Identifiers

Citation

Emilie Kaufmann, Nathaniel Korda, Rémi Munos. Thompson Sampling: An Asymptotically Optimal Finite Time Analysis. ALT 2012 - International Conference on Algorithmic Learning Theory, Oct 2012, Lyon, France. pp.199-213, ⟨10.1007/978-3-642-34106-9_18⟩. ⟨hal-00830033⟩

Share

Metrics

Record views

394