A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830201

M. G. Azar, V. Gmez, and H. J. Kappen, Dynamic Policy Programming with Function Approximation, 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

D. P. Bertsekas, Approximate policy iteration: a survey and some new methods, Journal of Control Theory and Applications, vol.27, issue.3, pp.310-335, 2011.
DOI : 10.1007/s11768-011-1005-3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.8653

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.
DOI : 10.1007/0-306-48332-7_333

L. Busoniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuska et al., Leastsquares methods for Policy Iteration, Reinforcement Learning: State of the Art, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00830122

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, 2005.

E. Even-dar, Planning in pomdps using multiplicity automata, Uncertainty in Artificial Intelligence (UAI, pp.185-192, 2005.

A. M. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized policy iteration, Advances in Neural Information Processing Systems, pp.441-448, 2009.

A. M. Farahmand, R. Munos, and C. Szepesvári, Error propagation for approximate policy and value iteration (extended version), NIPS, 2010.

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-based Policy Iteration with a Critic, International Conference on Machine Learning (ICML), pp.1049-1056, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

G. J. Gordon, Stable Function Approximation in Dynamic Programming, ICML, pp.261-268, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

C. Guestrin, D. Koller, and R. Parr, Max-norm projections for factored MDPs, International Joint Conference on Artificial Intelligence, pp.673-682, 2001.

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman, Efficient Solution Algorithms for Factored MDPs, Journal of Artificial Intelligence Research (JAIR), vol.19, pp.399-468, 2003.

S. M. Kakade, On the Sample Complexity of Reinforcement Learning, 2003.

S. M. Kakade and J. Langford, Approximately Optimal Approximate Reinforcement Learning, International Conference on Machine Learning (ICML), pp.267-274, 2002.

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-Sample Analysis of Least-Squares Policy Iteration, To appear in Journal of Machine learning Research, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00528596

O. A. Maillard, R. Munos, A. Lazaric, and M. Ghavamzadeh, Finite Sample Analysis of Bellman Residual Minimization, Asian Conference on Machine Learpning. JMLR: Workshop and Conference Proceedings, pp.309-324, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

R. Munos, Error Bounds for Approximate Policy Iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003.

R. Munos, Performance Bounds in Lp norm for Approximate Value Iteration, SIAM J. Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

R. Munos and C. Szepesvári, Finite time bounds for sampling based fitted value iteration, Journal of Machine Learning Research (JMLR), vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

M. Petrik and B. Scherrer, Biasing Approximate Dynamic Programming with a Lower Discount Factor, Twenty-Second Annual Conference on Neural Information Processing Systems -NIPS 2008, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00337652

J. Pineau, G. J. Gordon, and S. Thrun, Point-based value iteration: An anytime algorithm for POMDPs, International Joint Conference on Artificial Intelligence, pp.1025-1032, 2003.

M. Puterman, Markov Decision Processes, 1994.
DOI : 10.1002/9780470316887

S. Singh and R. Yee, An upper bound on the loss from approximate optimal-value functions, Machine Learning, pp.16-3227, 1994.
DOI : 10.1007/BF00993308

C. Thiery and B. Scherrer, Least-Squares ? Policy Iteration: Bias-Variance Trade-off in Control Problems, International Conference on Machine Learning, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00520841

J. N. Tsitsiklis and B. Van-roy, Feature-Based Methods for Large Scale Dynamic Programming, Machine Learning, pp.59-94, 1996.