Least-squares methods for policy iteration

Lucian Busoniu 1 Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 Rémi Munos 1 Robert Babuska 2 Bart de Schutter 3
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, policy evaluation component of policy iteration, called: least-squares temporal difference, least-squares policy evaluation, and Bellman residual minimization. We introduce these techniques starting from their general mathematical principles and detailing them down to fully specified algorithms. We pay attention to online variants of policy iteration, and provide a numerical example highlighting the behavior of representative offline and online methods. For the policy evaluation component as well as for the overall resulting approximate policy iteration, we provide guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity. We also provide finite-sample results, which apply when a finite number of samples and iterations are considered. Finally, we outline several extensions and improvements to the techniques and methods reviewed.
Document type :
Book sections
Complete list of metadatas

Cited literature [46 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00830122
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 2:38:30 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, September 5, 2013 - 4:22:21 AM

File

lspi_chapter.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830122, version 1

Collections

Citation

Lucian Busoniu, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Robert Babuska, et al.. Least-squares methods for policy iteration. Reinforcement Learning: State of the Art, Springer, pp.75-109, 2011. ⟨hal-00830122⟩

Share

Metrics

Record views

454

Files downloads

332