A review of optimistic planning in Markov decision processes

Lucian Busoniu 1, * Remi Munos 2 Robert Babuska 3
* Corresponding author
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall receding-horizon algorithm results, which can also be seen as a type of model-predictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value - or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00756742
Contributor : Lucian Busoniu <>
Submitted on : Friday, November 23, 2012 - 4:16:13 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM

Identifiers

  • HAL Id : hal-00756742, version 1

Citation

Lucian Busoniu, Remi Munos, Robert Babuska. A review of optimistic planning in Markov decision processes. Frank Lewis and Derong Liu. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, Wiley-IEEE Press, pp.494-516, 2013, IEEE Press Series on Computational Intelligence, 978-1-1181-0420-0. ⟨hal-00756742⟩

Share

Metrics

Record views

387