Gap-free Bounds for Stochastic Multi-Armed Bandit

Anatoli B. Juditsky; Alexander Nazin; Alexandre Tsybakov; Nicolas Vayatis

doi:10.3182/20080706-5-KR-1001.01959

Communication Dans Un Congrès Année : 2008

Gap-free Bounds for Stochastic Multi-Armed Bandit

(1) , (2) , (3) , (4)

1
2
3
4

Anatoli B. Juditsky

Fonction : Auteur
PersonId : 749890
IdHAL : anatoli-juditsky
ORCID : 0000-0001-5231-363X
IdRef : 110459563

Statistique et Modélisation Stochatisque

Alexander Nazin

Fonction : Auteur
PersonId : 830066

Institute of Control Sciences [Moscou]

Alexandre Tsybakov

Fonction : Auteur
PersonId : 847724

Laboratoire de Probabilités et Modèles Aléatoires

Nicolas Vayatis

Fonction : Auteur
PersonId : 848026

Centre de Mathématiques et de Leurs Applications

Résumé

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent/exponentiated gradient type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bounds on the excess risk of the average over time of the instantaneous losses induced by the choice of a specific action.

Mots clés

Learning theory Randomized methods Stochastic control

Domaines

Statistiques [math.ST] Théorie [stat.TH]

Anatoli Juditsky : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00317655

Soumis le : mercredi 3 septembre 2008-16:02:16

Dernière modification le : lundi 8 avril 2024-12:24:02

Dates et versions

hal-00317655 , version 1 (03-09-2008)

Identifiants

HAL Id : hal-00317655 , version 1
DOI : 10.3182/20080706-5-KR-1001.01959

Citer

Anatoli B. Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis. Gap-free Bounds for Stochastic Multi-Armed Bandit. 17th World IFAC Congress, Jul 2008, Seoul, South Korea. pp.11560-11563, ⟨10.3182/20080706-5-KR-1001.01959⟩. ⟨hal-00317655⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 UPMC UGA PMA CNRS ENS-CACHAN LJK LJK_PS LJK_PS_SMS LPSM SORBONNE-UNIVERSITE SU-SCIENCES ENS-PARIS-SACLAY

111 Consultations

0 Téléchargements

Gap-free Bounds for Stochastic Multi-Armed Bandit

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager