Error propagation for approximate policy and value iteration - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Error propagation for approximate policy and value iteration

Résumé

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.
Fichier principal
Vignette du fichier
error_prop_nips2010.pdf (198.85 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00830154 , version 1 (04-06-2013)

Identifiants

  • HAL Id : hal-00830154 , version 1

Citer

Amir Massoud Farahmand, Rémi Munos, Csaba Szepesvari. Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 2010, Canada. ⟨hal-00830154⟩
2085 Consultations
358 Téléchargements

Partager

Gmail Facebook X LinkedIn More