KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Abstract : In the context of K–armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a distribution-free regret bound of optimal order √ KT and a distribution-dependent regret bound of optimal order as well, that is, matching the κ ln T lower bound by Lai and Robbins (1985) and Burnetas and Katehakis (1996).
Liste complète des métadonnées

Cited literature [13 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01785705
Contributor : Hedi Hadiji <>
Submitted on : Friday, May 4, 2018 - 3:09:11 PM
Last modification on : Friday, April 12, 2019 - 4:22:51 PM
Document(s) archivé(s) le : Monday, September 24, 2018 - 7:39:59 PM

Identifiers

  • HAL Id : hal-01785705, version 1

Citation

Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints. 2018. ⟨hal-01785705⟩

Share

Metrics

Record views

154

Files downloads

92