Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

Robin Allesiardo; Raphaël Féraud; Odalric-Ambrym Maillard

Pré-Publication, Document De Travail Année : 2016

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

(1, 2) , (2) , (3, 1)

1
2
3

Robin Allesiardo

Fonction : Auteur
PersonId : 4381
IdHAL : robin-allesiardo
IdRef : 197869483

Machine Learning and Optimisation

Orange Labs [Lannion]

Raphaël Féraud

Fonction : Auteur

Orange Labs [Lannion]

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Laboratoire d'Informatique Fondamentale de Lille

Machine Learning and Optimisation

Résumé

We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of SUCCESSIVE ELIMINATION based on random shuffling of the K arms. We prove that under a novel and mild assumption on the mean gap ∆, this simple but powerful modification achieves the same guarantees in term of sample complexity and cumulative regret than its original version, but in a much wider class of problems, as it is not anymore constrained to stationary distributions. We also show that the original SUCCESSIVE ELIMINATION fails to have controlled regret in this more general scenario, thus showing the benefit of shuffling. We then remove our mild assumption and adapt the algorithm to the best-arm identification task with switching arms. We adapt the definition of the sample complexity for that case and prove that, against an optimal policy with N − 1 switches of the optimal arm, this new algorithm achieves an expected sample complexity of O(∆^{−2} sqrt(N Kdelta^{−1} log(K/delta)), where δ is the probability of failure of the algorithm, and an expected cumulative regret of O(∆^{−1} sqrt(N T K log(T K))) after T time steps.

Mots clés

Multi-armed bandits

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

1609.02139v1.pdf (506.38 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Odalric-Ambrym Maillard : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01400320

Soumis le : lundi 21 novembre 2016-17:32:36

Dernière modification le : mardi 13 février 2024-03:25:11

Archivage à long terme le : mardi 21 mars 2017-00:23:20

Dates et versions

hal-01400320 , version 1 (21-11-2016)

Identifiants

HAL Id : hal-01400320 , version 1
ARXIV : 1609.02139

Citer

Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard. Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem. 2016. ⟨hal-01400320⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA UMR8623 CENTRALESUPELEC INRIA2 LRI-AO UNIV-PARIS-SACLAY GS-COMPUTER-SCIENCE

176 Consultations

158 Téléchargements

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager