Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

Antoine Salomon; Jean-Yves Audibert; Issam El Alaoui

Pré-Publication, Document De Travail Année : 2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

(1) , (1, 2, 3) , (1)

1
2
3

Antoine Salomon

Fonction : Auteur correspondant
PersonId : 916255

Connectez-vous pour contacter l'auteur

imagine [Marne-la-Vallée]

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

imagine [Marne-la-Vallée]

Statistical Machine Learning and Parsimony

Laboratoire d'Informatique Gaspard-Monge

Issam El Alaoui

Fonction : Auteur
PersonId : 916256

imagine [Marne-la-Vallée]

Résumé

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existence of logarithmic bound in the general case of Hannan consistency. To get these results, we study variants of popular Upper Confidence Bounds (ucb) policies. As a by-product, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment.

Mots clés

stochastic bandits regret bounds selectivity ucb policies

Domaines

Autres [stat.ML]

Fichier principal

consistence.pdf (206.36 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Salomon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00652865

Soumis le : vendredi 16 décembre 2011-14:48:03

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : samedi 17 mars 2012-02:38:02

Dates et versions

hal-00652865 , version 1 (16-12-2011)

Identifiants

HAL Id : hal-00652865 , version 1
ARXIV : 1112.3827

Citer

Antoine Salomon, Jean-Yves Audibert, Issam El Alaoui. Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem. 2011. ⟨hal-00652865⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC CNRS INRIA UNIV-MLV LIGM_A3SI PARISTECH LIGM IMAGINE INRIA2 PSL ESIEE-PARIS UNIV-EIFFEL JSE2024

594 Consultations

1310 Téléchargements

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager