Bandits with Side Observations: Bounded vs. Logarithmic Regret - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Bandits with Side Observations: Bounded vs. Logarithmic Regret

Résumé

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free. We prove that, no matter how small $\epsilon$ is the agent can ensure a regret uniformly bounded in time. More precisely, we construct an algorithm with a regret smaller than $\sum_i \frac{\log(1/\epsilon)}{\Delta_i}$, up to multiplicative constant and loglog terms. We also prove a matching lower-bound, stating that no reasonable algorithm can outperform this quantity.

Dates et versions

hal-03089542 , version 1 (28-12-2020)

Identifiants

Citer

Rémy Degenne, Evrard Garcelon, Vianney Perchet. Bandits with Side Observations: Bounded vs. Logarithmic Regret. Conference on Uncertainty in Artificial Intelligence, Aug 2018, Monterey, United States. ⟨hal-03089542⟩
23 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More