Classification-based Policy Iteration with a Critic

Victor Gabillon; Alessandro Lazaric; Mohammad Ghavamzadeh; Bruno Scherrer

Rapport Année : 2011

Classification-based Policy Iteration with a Critic

(1) , (1) , (1) , (2)

1
2

Victor Gabillon

Fonction : Auteur
PersonId : 900485

Sequential Learning

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Résumé

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use the critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. %that are strongly related to the length of the rollout trajectories. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called {\em direct policy iteration with critic} (DPI-Critic), and provide its finite-sample analysis when the critic is based on LSTD and BRM methods. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Domaines

Autres [stat.ML]

Fichier principal

dpi-critic-techReport.pdf (286.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Victor Gabillon : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00590972

Soumis le : jeudi 5 mai 2011-20:00:05

Dernière modification le : vendredi 24 mars 2023-14:52:54

Archivage à long terme le : samedi 6 août 2011-02:56:49

Dates et versions

hal-00590972 , version 1 (05-05-2011)

Identifiants

HAL Id : hal-00590972 , version 1

Citer

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. 2011. ⟨hal-00590972⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS GRID5000 UNIV-LORRAINE INRIA2 LORIA LARA SILECS

322 Consultations

326 Téléchargements

Classification-based Policy Iteration with a Critic

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager