A. Antos, C. Szepesvári, M. , and R. , Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

A. Barto, R. Sutton, A. , and C. , Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, vol.13, issue.5, pp.835-846, 1983.
DOI : 10.1109/TSMC.1983.6313077

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Journal of Machine Learning, vol.22, pp.33-57, 1996.

C. Dimitrakakis and M. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3

A. Fern, S. Yoon, and R. Givan, Approximate policy iteration with a policy language bias, Proceedings of NIPS 16, 2004.

V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer, Classification-based policy iteration with a critic, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00590972

M. Lagoudakis and R. Parr, Least-squares policy iteration, JMLR, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of ICML, pp.424-431, 2003.

A. Ghavamzadeh, M. , M. , and R. , Analysis of a classification-based policy iteration algorithm, Proceedings of the Twenty-Seventh International Conference on Machine Learning, pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, M. , and R. , Analysis of a classification-based policy iteration algorithm
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, M. , and R. , Finitesample analysis of least-squares policy iteration
URL : https://hal.archives-ouvertes.fr/inria-00528596

O. Maillard, R. Munos, A. Lazaric, and M. Ghavamzadeh, Finite-sample analysis of Bellman residual minimization, Proceedings of the Second Asian Conference on Machine Learning, pp.299-314, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

R. Munos, Performance bounds in Lp norm for approximate value iteration, SIAM Journal of Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

R. Munos and C. Szepesvári, Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192