Robustness of Anytime Bandit Policies

Antoine Salomon; Jean-Yves Audibert

Pré-Publication, Document De Travail Année : 2011

Robustness of Anytime Bandit Policies

(1, 2) , (1, 2)

1
2

Antoine Salomon

Fonction : Auteur
PersonId : 916255

imagine [Marne-la-Vallée]

Laboratoire d'Informatique Gaspard-Monge

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

imagine [Marne-la-Vallée]

Laboratoire d'Informatique Gaspard-Monge

Résumé

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

Mots clés

Exploration-exploitation tradeoff Multi-armed bandits Risk analysis deviations

Domaines

Autres [stat.ML]

Fichier principal

anytime.pdf (1.14 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Salomon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00579607

Soumis le : lundi 25 juillet 2011-14:04:45

Dernière modification le : samedi 27 avril 2024-03:14:02

Archivage à long terme le : dimanche 4 décembre 2016-09:07:24

Dates et versions

hal-00579607 , version 1 (24-03-2011)

hal-00579607 , version 2 (21-07-2011)

hal-00579607 , version 3 (25-07-2011)

Identifiants

HAL Id : hal-00579607 , version 3
ARXIV : 1107.4506

Citer

Antoine Salomon, Jean-Yves Audibert. Robustness of Anytime Bandit Policies. 2011. ⟨hal-00579607v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC CNRS UNIV-MLV LIGM_A3SI INSMI PARISTECH LIGM IMAGINE ESIEE-PARIS UNIV-EIFFEL JSE2024

665 Consultations

187 Téléchargements

Robustness of Anytime Bandit Policies

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager