Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

Finite-Sample Analysis of Bellman Residual Minimization

Odalric-Ambrym Maillard 1 Rémi Munos 1 Alessandro Lazaric 1 Mohammad Ghavamzadeh 1 
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual defined on a set of n states drawn i.i.d. from a distribution, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual with a rate of order O(1/sqrt((n)). This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples n and a measure of how well the function space is able to approximate the sequence of value functions.)
Document type :
Conference papers
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00830212
Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Tuesday, June 4, 2013 - 3:37:00 PM
Last modification on : Thursday, January 20, 2022 - 4:12:35 PM
Long-term archiving on: : Thursday, September 5, 2013 - 4:23:37 AM

File

brm_acml2010.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00830212, version 1

Collections

Citation

Odalric-Ambrym Maillard, Rémi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh. Finite-Sample Analysis of Bellman Residual Minimization. Asian Conference on Machine Learning, 2010, Japan. ⟨hal-00830212⟩

Share

Metrics

Record views

270

Files downloads

136