Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

Résumé

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existence of logarithmic bound in the general case of Hannan consistency. To get these results, we study variants of popular Upper Confidence Bounds (ucb) policies. As a by-product, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment.

Domaines

Autres [stat.ML]
Fichier principal
Vignette du fichier
consistence.pdf (206.36 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00652865 , version 1 (16-12-2011)

Identifiants

Citer

Antoine Salomon, Jean-Yves Audibert, Issam El Alaoui. Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem. 2011. ⟨hal-00652865⟩
594 Consultations
1310 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More