D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, 16th European Conference on Machine Learning, pp.317-328, 2005.
DOI : 10.1007/11564096_32

M. Amir-massoud-farahmand, C. Ghavamzadeh, S. Szepesvári, and . Mannor, Regularized fitted Q-iteration for planning in continuous-space markovian decision problems, Proceedings of American Control Conference (ACC), pp.725-730, 2009.

R. Munos and C. Szepesvári, Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

G. Michail, R. Lagoudakis, and . Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

J. Steven, A. G. Bradtke, and . Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2

URL : https://hal.archives-ouvertes.fr/hal-00830201

O. Maillard, R. Munos, A. Lazaric, and M. Ghavamzadeh, Finitesample analysis of bellman residual minimization, Proceedings of the Second Asian Conference on Machine Learning (ACML), 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

M. Amir-massoud-farahmand, C. Ghavamzadeh, S. Szepesvári, and . Mannor, Regularized policy iteration, Advances in Neural Information Processing Systems 21, pp.441-448, 2009.

J. , Z. Kolter, and A. Y. Ng, Regularization and feature selection in least-squares temporal difference learning, ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pp.521-528, 2009.

X. Xu, D. Hu, and X. Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007.
DOI : 10.1109/TNN.2007.899161

T. Jung and D. Polani, Least squares SVM for least squares TD learning, Proc. 17th European Conference on Artificial Intelligence, pp.499-503, 2006.

G. Taylor and R. Parr, Kernelized value function approximation for reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.1017-1024, 2009.
DOI : 10.1145/1553374.1553504

S. Mahadevan and M. Maggioni, Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007.
DOI : 10.1145/1102351.1102421

A. Geramifard, M. Bowling, M. Zinkevich, and R. S. Sutton, iLSTD: Eligibility traces and convergence analysis, Advances in Neural Information Processing Systems 19, pp.441-448, 2007.

P. Dimitri, J. N. Bertsekas, and . Tsitsiklis, Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), Athena Scientific, 1996.

R. Munos, Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, 2007.
DOI : 10.1137/040614384

R. Munos, Error bounds for approximate policy iteration, ICML 2003: Proceedings of the 20th Annual International Conference on Machine Learning, 2003.

P. Dimitri, S. E. Bertsekas, and . Shreve, Stochastic Optimal Control: The Discrete-Time Case, 1978.

S. Richard, A. G. Sutton, and . Barto, Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998.

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

L. Györfi and M. Kohler, Adam Krzy? zak, and Harro Walk. A Distribution-Free Theory of Nonparametric Regression, 2002.