# KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

4 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
6 CELESTE - Statistique mathématique et apprentissage
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
Abstract : In the context of K-armed stochastic bandits with distribution only assumed to be supported by [0,1], we introduce the first algorithm, called KL-UCB-switch, that enjoys simultaneously a distribution-free regret bound of optimal order $\sqrt{KT}$ and a distribution-dependent regret bound of optimal order as well, that is, matching the $\kappa\ln T$ lower bound by Lai & Robbins (1985) and Burnetas & Katehakis (1996). This self-contained contribution simultaneously presents state-of-the-art techniques for regret minimization in bandit models, and an elementary construction of non-asymptotic confidence bounds based on the empirical likelihood method for bounded distributions.
Keywords :
Document type :
Preprints, Working Papers, ...
Domain :

https://hal.archives-ouvertes.fr/hal-01785705
Contributor : Gilles Stoltz <>
Submitted on : Tuesday, November 5, 2019 - 4:07:57 PM
Last modification on : Tuesday, September 29, 2020 - 12:24:09 PM
Long-term archiving on: : Friday, February 7, 2020 - 5:41:22 AM

### Files

KL-UCB-GHMS.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-01785705, version 2

### Citation

Aurélien Garivier, Hédi Hadiji, Pierre Ménard, Gilles Stoltz. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints. 2019. ⟨hal-01785705v2⟩

Record views