Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Tuning bandit algorithms in stochastic environments

Jean-yves Audibert 1 Rémi Munos 2 Csaba Szepesvari 3 
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Algorithms based on upper-confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. In this paper we consider a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. The purpose of this paper is to provide a theoretical explanation of these findings and provide theoretical guidelines for the tuning of the parameters of these algorithms. For this we analyze the expected regret and for the first time the concentration of the regret. The analysis of the expected regret shows that variance estimates can be especially advantageous when the payoffs of suboptimal arms have low variance. The risk analysis, rather unexpectedly, reveals that except for some very special bandit problems, the regret, for upper confidence bounds based algorithms with standard bias sequences, concentrates only at a polynomial rate. Hence, although these algorithms achieve logarithmic expected regret rates, they seem less attractive when the risk of suffering much worse than logarithmic regret is also taken into account.
Document type :
Conference papers
Complete list of metadata

Cited literature [8 references]  Display  Hide  Download
Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Thursday, January 10, 2008 - 12:02:12 PM
Last modification on : Thursday, January 20, 2022 - 4:12:31 PM
Long-term archiving on: : Tuesday, April 13, 2010 - 4:55:11 PM


Files produced by the author(s)


  • HAL Id : inria-00203487, version 1



Jean-yves Audibert, Rémi Munos, Csaba Szepesvari. Tuning bandit algorithms in stochastic environments. Algorithmic Learning Theory, 2007, Sendai, Japan. pp.150-165. ⟨inria-00203487⟩



Record views


Files downloads