Tuning bandit algorithms in stochastic environments

Jean-Yves Audibert; Rémi Munos; Csaba Szepesvari

Communication Dans Un Congrès Année : 2007

Tuning bandit algorithms in stochastic environments

(1) , (2) , (3)

1
2
3

Jean-Yves Audibert

Fonction : Auteur
PersonId : 931557

Centre d'Enseignement et de Recherche en Mathématiques et Calcul Scientifique

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Csaba Szepesvari

Fonction : Auteur

Computer and Automation Research Institute [Budapest]

Résumé

Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account.

Domaines

Apprentissage [cs.LG]

Fichier principal

ucb_alt.pdf (194.36 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00203487

Soumis le : jeudi 10 janvier 2008-12:02:12

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : mardi 13 avril 2010-16:55:11

Dates et versions

inria-00203487 , version 1 (10-01-2008)

Identifiants

HAL Id : inria-00203487 , version 1

Citer

Jean-Yves Audibert, Rémi Munos, Csaba Szepesvari. Tuning bandit algorithms in stochastic environments. Algorithmic Learning Theory, 2007, Sendai, Japan. pp.150-165. ⟨inria-00203487⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENPC UNIV-LILLE3 CNRS INRIA CERMICS PARISTECH LAGIS INRIA2

566 Consultations

1067 Téléchargements

Tuning bandit algorithms in stochastic environments

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager