Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Conservative and Greedy Approaches to Classification-based Policy Iteration

Mohammad Ghavamzadeh 1 Alessandro Lazaric 1 
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: {\em direct policy iteration} (DPI) methods that directly assign the output of the classifier (the approximate greedy policy w.r.t.~the current policy) to the next policy, and {\em conservative policy iteration} (CPI) methods in which the new policy is a mixture distribution of the current policy and the output of the classifier. The conservative policy update gives CPI a desirable feature, namely the guarantee that the policies generated by this algorithm improve at each iteration. We provide a detailed algorithmic and theoretical comparison of these two classes of CBPI algorithms. Our results reveal that in order to achieve the same level of accuracy, CPI requires more iterations, and thus, more samples than the DPI algorithm. Furthermore, CPI may converge to suboptimal policies whose performance is not better than DPI's.
Document type :
Conference papers
Complete list of metadata

Cited literature [9 references]  Display  Hide  Download
Contributor : Alessandro Lazaric Connect in order to contact the contributor
Submitted on : Thursday, January 10, 2013 - 6:12:36 PM
Last modification on : Thursday, January 20, 2022 - 4:17:18 PM
Long-term archiving on: : Thursday, April 11, 2013 - 4:08:46 AM


Files produced by the author(s)


  • HAL Id : hal-00772610, version 1



Mohammad Ghavamzadeh, Alessandro Lazaric. Conservative and Greedy Approaches to Classification-based Policy Iteration. AAAI - 26th Conference on Artificial Intelligence, Jul 2012, Toronto, Canada. ⟨hal-00772610⟩



Record views


Files downloads