Error propagation for approximate policy and value iteration

Amir Massoud Farahmand; Rémi Munos; Csaba Szepesvari

Communication Dans Un Congrès Année : 2010

Error propagation for approximate policy and value iteration

(1) , (2) , (1)

1
2

Amir Massoud Farahmand

Fonction : Auteur

Department of Computing Science [Edmonton]

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Csaba Szepesvari

Fonction : Auteur
PersonId : 844057

Department of Computing Science [Edmonton]

Résumé

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.

Domaines

Apprentissage [cs.LG]

Fichier principal

error_prop_nips2010.pdf (198.85 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Munos : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00830154

Soumis le : mardi 4 juin 2013-15:04:21

Dernière modification le : vendredi 24 mars 2023-14:52:57

Archivage à long terme le : jeudi 5 septembre 2013-04:22:49

Dates et versions

hal-00830154 , version 1 (04-06-2013)

Identifiants

HAL Id : hal-00830154 , version 1

Citer

Amir Massoud Farahmand, Rémi Munos, Csaba Szepesvari. Error propagation for approximate policy and value iteration. Advances in Neural Information Processing Systems, 2010, Canada. ⟨hal-00830154⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2

2086 Consultations

360 Téléchargements

Error propagation for approximate policy and value iteration

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager