Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

Résumé

We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of SUCCESSIVE ELIMINATION based on random shuffling of the K arms. We prove that under a novel and mild assumption on the mean gap ∆, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original SUCCESSIVE ELIMINATION fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with N − 1 switches of the optimal arm, this new algorithm achieves an expected sample complexity of O(∆^{−2} sqrt(N Kdelta^{−1} log(K/delta)), where δ is the probability of failure of the algorithm, and an expected cumulative regret of O(∆^{−1} sqrt(N T K log(T K))) after T time steps.

Mots clés

Fichier principal
Vignette du fichier
1609.02139v1.pdf (506.38 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01400320 , version 1 (21-11-2016)

Identifiants

Citer

Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard. Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem. 2016. ⟨hal-01400320⟩
176 Consultations
158 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More