Classification-based Policy Iteration with a Critic - Archive ouverte HAL Accéder directement au contenu
Rapport Année : 2011

Classification-based Policy Iteration with a Critic

Victor Gabillon
  • Fonction : Auteur
  • PersonId : 900485
Alessandro Lazaric
Mohammad Ghavamzadeh
  • Fonction : Auteur
  • PersonId : 868946
Bruno Scherrer

Résumé

In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use the critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. %that are strongly related to the length of the rollout trajectories. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called {\em direct policy iteration with critic} (DPI-Critic), and provide its finite-sample analysis when the critic is based on LSTD and BRM methods. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.

Domaines

Autres [stat.ML]
Fichier principal
Vignette du fichier
dpi-critic-techReport.pdf (286.61 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00590972 , version 1 (05-05-2011)

Identifiants

  • HAL Id : hal-00590972 , version 1

Citer

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. 2011. ⟨hal-00590972⟩
322 Consultations
324 Téléchargements

Partager

Gmail Facebook X LinkedIn More