Improved second-order bounds for prediction with expert advice

Abstract : This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: $M$, the largest absolute value of any payoff, and $Q^*$, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of $\\sqrt{Q^*(\\ln N)} + M\\,\\ln N$, where $N$ is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.
Complete list of metadatas
Contributor : Gilles Stoltz <>
Submitted on : Friday, July 15, 2005 - 5:09:16 PM
Last modification on : Tuesday, April 2, 2019 - 2:16:16 PM


  • HAL Id : hal-00007539, version 1



Nicolo Cesa-Bianchi, Yishay Mansour, Gilles Stoltz. Improved second-order bounds for prediction with expert advice. 2005, pp.217-232. ⟨hal-00007539⟩



Record views