Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation

Abstract : We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.
Type de document :
Article dans une revue
Annals of Statistics, Institute of Mathematical Statistics, 2013, 41 (3), pp.1516-1541
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00738209
Contributeur : Gilles Stoltz <>
Soumis le : jeudi 21 mars 2013 - 13:17:03
Dernière modification le : vendredi 17 février 2017 - 14:29:43
Document(s) archivé(s) le : samedi 22 juin 2013 - 04:50:09

Fichiers

klucb.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00738209, version 2
  • ARXIV : 1210.1136

Citation

Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Annals of Statistics, Institute of Mathematical Statistics, 2013, 41 (3), pp.1516-1541. 〈hal-00738209v2〉

Partager

Métriques

Consultations de la notice

992

Téléchargements de fichiers

1279