A review of optimistic planning in Markov decision processes - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2013

A review of optimistic planning in Markov decision processes

Remi Munos
  • Fonction : Auteur
  • PersonId : 836863
Robert Babuska
  • Fonction : Auteur
  • PersonId : 933137

Résumé

We review a class of online planning algorithms for deterministic and stochastic optimal control problems, modeled as Markov decision processes. At each discrete time step, these algorithms maximize the predicted value of planning policies from the current state, and apply the first action of the best policy found. An overall receding-horizon algorithm results, which can also be seen as a type of model-predictive control. The space of planning policies is explored optimistically, focusing on areas with largest upper bounds on the value - or upper confidence bounds, in the stochastic case. The resulting optimistic planning framework integrates several types of optimism previously used in planning, optimization, and reinforcement learning, in order to obtain several intuitive algorithms with good performance guarantees. We describe in detail three recent such algorithms, outline the theoretical guarantees on their performance, and illustrate their behavior in a numerical example.
Fichier non déposé

Dates et versions

hal-00756742 , version 1 (23-11-2012)

Identifiants

  • HAL Id : hal-00756742 , version 1

Citer

Lucian Busoniu, Remi Munos, Robert Babuska. A review of optimistic planning in Markov decision processes. Frank Lewis and Derong Liu. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control, Wiley-IEEE Press, pp.494-516, 2013, IEEE Press Series on Computational Intelligence, 978-1-1181-0420-0. ⟨hal-00756742⟩
202 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More