Skip to Main content Skip to Navigation

Classification-based Policy Iteration with a Critic

Victor Gabillon 1 Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 Bruno Scherrer 2
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
2 MAIA - Autonomous intelligent machine
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use the critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. %that are strongly related to the length of the rollout trajectories. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called {\em direct policy iteration with critic} (DPI-Critic), and provide its finite-sample analysis when the critic is based on LSTD and BRM methods. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
Document type :
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Victor Gabillon Connect in order to contact the contributor
Submitted on : Thursday, May 5, 2011 - 8:00:05 PM
Last modification on : Friday, February 26, 2021 - 3:28:05 PM
Long-term archiving on: : Saturday, August 6, 2011 - 2:56:49 AM


Files produced by the author(s)


  • HAL Id : hal-00590972, version 1


Victor Gabillon, Alessandro Lazaric, Mohammad Ghavamzadeh, Bruno Scherrer. Classification-based Policy Iteration with a Critic. 2011. ⟨hal-00590972⟩



Record views


Files downloads