Skip to Main content Skip to Navigation
Book sections

A review of optimistic planning in Markov decision processes

Lucian Busoniu 1, * Remi Munos 2 Robert Babuska 3
* Corresponding author
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall receding-horizon algorithm results, which can also be seen as a type of model-predictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value - or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example.
Complete list of metadata
Contributor : Lucian Busoniu Connect in order to contact the contributor
Submitted on : Friday, November 23, 2012 - 4:16:13 PM
Last modification on : Saturday, October 16, 2021 - 11:14:12 AM


  • HAL Id : hal-00756742, version 1


Lucian Busoniu, Remi Munos, Robert Babuska. A review of optimistic planning in Markov decision processes. Frank Lewis and Derong Liu. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, Wiley-IEEE Press, pp.494-516, 2013, IEEE Press Series on Computational Intelligence, 978-1-1181-0420-0. ⟨hal-00756742⟩



Record views