Skip to Main content Skip to Navigation
New interface
Journal articles

Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation

Olivier Cappé 1 Aurélien Garivier 2 Odalric-Ambrym Maillard 3 Rémi Munos 4 Gilles Stoltz 5, 6 
4 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
5 CLASSIC - Computational Learning, Aggregation, Supervised Statistical, Inference, and Classification
DMA - Département de Mathématiques et Applications - ENS Paris, ENS-PSL - École normale supérieure - Paris, Inria Paris-Rocquencourt
Abstract : We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.
Complete list of metadata
Contributor : Gilles Stoltz Connect in order to contact the contributor
Submitted on : Thursday, March 21, 2013 - 1:17:03 PM
Last modification on : Tuesday, October 25, 2022 - 11:58:11 AM
Long-term archiving on: : Saturday, June 22, 2013 - 4:50:09 AM


Files produced by the author(s)



Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics, 2013, 41 (3), pp.1516-1541. ⟨10.1214/13-AOS1119⟩. ⟨hal-00738209v2⟩



Record views


Files downloads