Skip to Main content Skip to Navigation
Journal articles

Regret Bounds and Minimax Policies under Partial Monitoring

Jean-yves Audibert 1, 2 Sébastien Bubeck 3 
1 imagine [Marne-la-Vallée]
LIGM - Laboratoire d'Informatique Gaspard-Monge, CSTB - Centre Scientifique et Technique du Bâtiment, ENPC - École des Ponts ParisTech
2 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique - ENS Paris, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for ψ(x)=exp(η x) + γ/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with ψ(x)=(η/-x)q + γ/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
Document type :
Journal articles
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Jean-Yves Audibert Connect in order to contact the contributor
Submitted on : Wednesday, December 21, 2011 - 5:08:02 PM
Last modification on : Thursday, March 17, 2022 - 10:08:39 AM
Long-term archiving on: : Thursday, March 22, 2012 - 2:31:29 AM


Explicit agreement for this submission


  • HAL Id : hal-00654356, version 1


Jean-yves Audibert, Sébastien Bubeck. Regret Bounds and Minimax Policies under Partial Monitoring. Journal of Machine Learning Research, Microtome Publishing, 2010, 11, pp.2785-2836. ⟨hal-00654356⟩



Record views


Files downloads