Error propagation for approximate policy and value iteration

Amir Massoud Farahmand 1 Rémi Munos 2 Csaba Szepesvari 1
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.
Document type :
Conference papers
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00830154
Contributor : Rémi Munos <>
Submitted on : Tuesday, June 4, 2013 - 3:04:21 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM
Long-term archiving on : Thursday, September 5, 2013 - 4:22:49 AM

File

error_prop_nips2010.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830154, version 1

Collections

Citation

Amir Massoud Farahmand, Rémi Munos, Csaba Szepesvari. Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 2010, Canada. ⟨hal-00830154⟩

Share

Metrics

Record views

1037

Files downloads

313