Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

Antoine Salomon 1, 2 Jean-Yves Audibert 1, 2 Issam El Alaoui 1, 2
1 IMAGINE [Marne-la-Vallée]
LIGM - Laboratoire d'Informatique Gaspard-Monge, CSTB - Centre Scientifique et Technique du Bâtiment, ENPC - École des Ponts ParisTech
Abstract : This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarithmic bounds in general and in the case of Hannan consistency. Moreover, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment. To get these results, we study variants of popular Upper Confidence Bounds (UCB) policies. © 2013 Antoine Salomon, Jean-Yves Audibert and Issam El Alaoui.
Type de document :
Article dans une revue
Journal of Machine Learning Research, Journal of Machine Learning Research, 2013, 14 (1), pp.187-207
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00947718
Contributeur : Frédérique Bordignon <>
Soumis le : lundi 17 février 2014 - 12:16:12
Dernière modification le : jeudi 5 juillet 2018 - 14:29:13

Identifiants

  • HAL Id : hal-00947718, version 1

Citation

Antoine Salomon, Jean-Yves Audibert, Issam El Alaoui. Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem. Journal of Machine Learning Research, Journal of Machine Learning Research, 2013, 14 (1), pp.187-207. 〈hal-00947718〉

Partager

Métriques

Consultations de la notice

196