Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

Aurélien Garivier 1, 2 Hédi Hadiji 3 Pierre Ménard 4 Gilles Stoltz 3, 5, 6
4 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
6 CELESTE - Statistique mathématique et apprentissage
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
Abstract : In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0,1], we introduce the first algorithm, called KL-UCB-switch, that enjoys simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret bound of optimal order as well, that is, matching the $\kappa\ln T$ lower bound by Lai & Robbins (1985) and Burnetas & Katehakis (1996). This self-contained contribution simultaneously presents state-of-the-art techniques for regret minimization in bandit models, and an elementary construction of non-asymptotic confidence bounds based on the empirical likelihood method for bounded distributions.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01785705
Contributor : Gilles Stoltz <>
Submitted on : Tuesday, November 5, 2019 - 4:07:57 PM
Last modification on : Tuesday, September 29, 2020 - 12:24:09 PM
Long-term archiving on: : Friday, February 7, 2020 - 5:41:22 AM

Identifiers

  • HAL Id : hal-01785705, version 2

Citation

Aurélien Garivier, Hédi Hadiji, Pierre Ménard, Gilles Stoltz. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints. 2019. ⟨hal-01785705v2⟩

Share

Metrics

Record views

125

Files downloads

353