Skip to Main content Skip to Navigation
Conference papers

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

Emilie Kaufmann 1 Nathaniel Korda 2 Rémi Munos 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-00830033
Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Tuesday, June 4, 2013 - 12:01:22 PM
Last modification on : Tuesday, December 8, 2020 - 10:06:01 AM

Links full text

Identifiers

`

Citation

Emilie Kaufmann, Nathaniel Korda, Rémi Munos. Thompson Sampling: An Asymptotically Optimal Finite Time Analysis. ALT 2012 - International Conference on Algorithmic Learning Theory, Oct 2012, Lyon, France. pp.199-213, ⟨10.1007/978-3-642-34106-9_18⟩. ⟨hal-00830033⟩

Share

Metrics

Record views

582