Classification-based Policy Iteration with a Critic

Victor Gabillon 1 Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 Bruno Scherrer 2
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
2 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use the critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. %that are strongly related to the length of the rollout trajectories. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called {\em direct policy iteration with critic} (DPI-Critic), and provide its finite-sample analysis when the critic is based on LSTD and BRM methods. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
Type de document :
Rapport
2011
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00590972
Contributeur : Victor Gabillon <>
Soumis le : jeudi 5 mai 2011 - 20:00:05
Dernière modification le : jeudi 21 février 2019 - 10:52:49
Document(s) archivé(s) le : samedi 6 août 2011 - 02:56:49

Fichier

dpi-critic-techReport.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00590972, version 1

Citation

Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. 2011. 〈hal-00590972〉

Partager

Métriques

Consultations de la notice

573

Téléchargements de fichiers

359