The Non-stationary Stochastic Multi-armed Bandit Problem

Robin Allesiardo 1 Raphaël Féraud 2 Odalric-Ambrym Maillard 3
3 TAO - Machine Learning and Optimisation
LRI - Laboratoire de Recherche en Informatique, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary stochastic process. We first study the unique best arm setting when there exists one unique best arm. Second, we study the general switching best arm setting when a best arm switches at some unknown steps. For both settings, we target problem-dependent bounds, instead of the more conservative problem-free bounds. We consider two classical problems: (1) identify a best arm with high probability (best arm identification), for which the performance measure by the sample complexity (number of samples before finding a near-optimal arm). To this end, we naturally extend the definition of sample complexity so that it makes sense in the switching best arm setting, which may be of independent interest. (2) Achieve the smallest cumulative regret (regret minimization) where the regret is measured with respect to the strategy pulling an arm with the best instantaneous mean at each step.
Type de document :
Article dans une revue
International Journal of Data Science and Analytics, Springer Verlag, 2017, 3 (4), pp.267-283. 〈10.1007/s41060-017-0050-5〉
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01575000
Contributeur : Odalric-Ambrym Maillard <>
Soumis le : lundi 23 octobre 2017 - 08:46:52
Dernière modification le : jeudi 7 février 2019 - 17:05:34
Document(s) archivé(s) le : mercredi 24 janvier 2018 - 12:11:00

Identifiants

Citation

Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard. The Non-stationary Stochastic Multi-armed Bandit Problem. International Journal of Data Science and Analytics, Springer Verlag, 2017, 3 (4), pp.267-283. 〈10.1007/s41060-017-0050-5〉. 〈hal-01575000〉

Partager

Métriques

Consultations de la notice

411

Téléchargements de fichiers

1014