Gap-free Bounds for Stochastic Multi-Armed Bandit

Abstract : We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent/exponentiated gradient type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bounds on the excess risk of the average over time of the instantaneous losses induced by the choice of a specific action.
Type de document :
Communication dans un congrès
Chung, Myung Jin and Misra, Pradeep. 17th World IFAC Congress, Jul 2008, Seoul, South Korea. International Federation of Automatic Control, pp.11560-11563, 2008, 〈10.3182/20080706-5-KR-1001.01959〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00317655
Contributeur : Anatoli Juditsky <>
Soumis le : mercredi 3 septembre 2008 - 16:02:16
Dernière modification le : lundi 29 mai 2017 - 14:24:36

Identifiants

Collections

Citation

Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis. Gap-free Bounds for Stochastic Multi-Armed Bandit. Chung, Myung Jin and Misra, Pradeep. 17th World IFAC Congress, Jul 2008, Seoul, South Korea. International Federation of Automatic Control, pp.11560-11563, 2008, 〈10.3182/20080706-5-KR-1001.01959〉. 〈hal-00317655〉

Partager

Métriques

Consultations de la notice

149