Skip to Main content Skip to Navigation
Conference papers

Error propagation for approximate policy and value iteration

Amir Massoud Farahmand 1 Rémi Munos 2 Csaba Szepesvari 1
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.
Document type :
Conference papers
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 3:04:21 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on: : Thursday, September 5, 2013 - 4:22:49 AM


Files produced by the author(s)


  • HAL Id : hal-00830154, version 1



Amir Massoud Farahmand, Rémi Munos, Csaba Szepesvari. Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 2010, Canada. ⟨hal-00830154⟩



Record views


Files downloads