KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Abstract : In the context of K–armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a distribution-free regret bound of optimal order √ KT and a distribution-dependent regret bound of optimal order as well, that is, matching the κ ln T lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996).
Type de document :
Pré-publication, Document de travail
2018
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01785705
Contributeur : Hedi Hadiji <>
Soumis le : vendredi 4 mai 2018 - 15:09:11
Dernière modification le : vendredi 14 septembre 2018 - 09:16:06

Identifiants

  • HAL Id : hal-01785705, version 1

Citation

Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints. 2018. 〈hal-01785705〉

Partager

Métriques

Consultations de la notice

91

Téléchargements de fichiers

30