# Adaptation to the Range in $K$-Armed Bandits

2 CELESTE - Statistique mathématique et apprentissage
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay
Abstract : We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and \smash{$\sqrt{T}$} bounds. For instance, a \smash{$\sqrt{T}$} distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order \smash{$\sqrt{T}$}. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.
Keywords :
Document type :
Preprints, Working Papers, ...
Domain :

https://hal.archives-ouvertes.fr/hal-02794382
Contributor : Gilles Stoltz <>
Submitted on : Tuesday, November 10, 2020 - 11:17:23 AM
Last modification on : Saturday, November 14, 2020 - 3:33:35 AM

### Files

Files produced by the author(s)

### Identifiers

• HAL Id : hal-02794382, version 2
• ARXIV : 2006.03378

### Citation

Hédi Hadiji, Gilles Stoltz. Adaptation to the Range in $K$-Armed Bandits. 2020. ⟨hal-02794382v2⟩

Record views