A. Antos, S. Cs, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2

URL : https://hal.archives-ouvertes.fr/hal-00830201

L. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings 12th International Conference on Machine Learning (ICML-95), pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.5034

D. Bertsekas, Approximate dynamic programming, update of Chapter Approximate policy iteration: A survey and some new methods, of the book Dynamic Programming and Optimal Control, 2010.

D. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neurodynamic programming Neuro-Dynamic Programming Improved temporal difference methods with linear function approximation Technical update: Least-squares temporal difference learning, Learning and Approximate Dynamic Programming, pp.233-246, 1996.

S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol.22, pp.1-333, 1996.
DOI : 10.1007/978-0-585-33656-5_4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.143.857

L. Bus¸oniubus¸oniu, R. Babu?ka, D. Schutter, B. , E. D. Bus¸oniubus¸oniu et al., Reinforcement Learning and Dynamic Programming Using Function Approximators Automation and Control Engineering Using prior knowledge to accelerate online least-squares policy iteration, IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR-10), 2010.

L. Bus¸oniubus¸oniu, D. Ernst, D. Schutter, B. Babu?ka, and R. , Approximate dynamic programming with a fuzzy parameterization, Automatica, vol.46, issue.5, pp.804-814, 2010.
DOI : 10.1016/j.automatica.2010.02.006

L. Bus¸oniubus¸oniu, D. Ernst, D. Schutter, B. Babu?ka, and R. , Online least-squares policy iteration for reinforcement learning control, Proceedings 2010 American Control Conference (ACC-10), pp.486-491, 2010.

C. Dimitrakakis and M. Lagoudakis, Rollout sampling approximate policy iteration, Machine Learning, vol.4, issue.1, pp.157-171, 2008.
DOI : 10.1007/s10994-008-5069-3

URL : http://arxiv.org/abs/0805.2027

Y. Engel, S. Mannor, and R. Meir, Bayes meets Bellman: The Gaussian process approach to temporal difference learning, Proceedings 20th International Conference on Machine Learning (ICML-03), pp.154-161, 2003.

Y. Engel, S. Mannor, and R. Meir, Reinforcement learning with Gaussian processes, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.201-208, 2005.
DOI : 10.1145/1102351.1102377

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

A. Farahmand, M. Ghavamzadeh, S. Cs, and S. Mannor, Regularized policy iteration, Advances in Neural Information Processing Systems 21, pp.441-448, 2009.

A. Geramifard, M. Bowling, and R. Sutton, Incremental least-squares temporal difference learning, Proceedings 21st National Conference on Artificial Intelligence and 18th Innovative Applications of Artificial Intelligence Conference (AAAI-06), pp.356-361, 2006.

A. Geramifard, M. Bowling, M. Zinkevich, and R. Sutton, iLSTD: Eligibility traces & convergence analysis, Advances in Neural Information Processing Systems, pp.440-448, 2007.

G. Golub, V. Loan, C. Jung, T. Polani, and D. , Matrix Computations Kernelizing LSPE(? ), Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL-07), pp.338-345, 1996.

T. Jung and D. Polani, Learning RoboCup-keepaway with kernels, Gaussian Processes in Practice, JMLR Workshop and Conference Proceedings, pp.33-57, 2007.

J. Kolter and A. Ng, Regularization and feature selection in least-squares temporal difference learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.521-528, 2009.
DOI : 10.1145/1553374.1553442

V. Konda, Actor-critic algorithms, 2002.

M. Lagoudakis, R. Parr, and M. Littman, Least-Squares Methods in Reinforcement Learning for Control, Methods and Applications of Artificial Intelligence Lecture Notes in Artificial Intelligence, vol.2308, pp.249-260, 2002.
DOI : 10.1007/3-540-46014-4_23

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification: Leveraging modern classifiers, Proceedings 20th International Conference on Machine Learning (ICML-03), pp.424-431, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classification-based policy iteration algorithm, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.607-614, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of LSTD, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.615-622, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482189

L. Li, M. Littman, and C. Mansley, Online exploration in least-squares policy iteration, Proceedings 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-09), pp.733-739, 2009.

H. Maei, C. Szepesvári, S. Bhatnagar, and R. Sutton, Toward off-policy learning control with function approximation, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.719-726, 2010.

O. Maillard, R. Munos, A. Lazaric, and M. Ghavamzadeh, Finite-sample analysis of Bellman residual minimization, pp.299-314, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00830212

S. Meyn and L. Tweedie, Markov chains and stochastic stability The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces, Atkeson CR Machine Learning, vol.21, issue.3, pp.199-233, 1993.

R. Munos, Error bounds for approximate policy iteration, Proceedings 20th International Conference (ICML-03), pp.560-567, 2003.

R. Munos, Approximate Dynamic Programming, Markov Decision Processes in Artificial Intelligence, 2010.
DOI : 10.1002/9781118557426.ch3

URL : https://hal.archives-ouvertes.fr/hal-00943118

R. Munos and S. Cs, Finite time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

A. Nedi´cnedi´c and D. Bertsekas, Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, Theory and Applications, vol.13, issue.12, pp.79-110, 2003.

C. Rasmussen and M. Kuss, Gaussian processes in reinforcement learning Advances in Neural Information Processing Systems 16 Should one compute the Temporal Difference fix point or minimize the Bellman Residual? the unified oblique projection view, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.959-966, 2004.

P. Schweitzer and A. Seidmann, Generalized polynomial approximations in Markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, issue.2, pp.568-582, 1985.
DOI : 10.1016/0022-247X(85)90317-8

R. Sutton, H. Maei, D. Precup, S. Bhatnagar, D. Silver et al., Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.993-1000, 2009.
DOI : 10.1145/1553374.1553501

R. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol.34, issue.1, pp.9-44, 1988.
DOI : 10.1007/BF00115009

R. Sutton, S. Cs, and H. Maei, A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation, Advances in Neural Information Processing Systems, pp.1609-1616, 2009.

G. Szepesvári-cs-taylor and R. Parr, Algorithms for Reinforcement Learning Kernelized value function approximation for reinforcement learning, Proceedings 26th International Conference on Machine Learning (ICML-09), pp.1017-1024, 2009.

C. Thiery and B. Scherrer, Least-squares ? policy iteration: Bias-variance trade-off in control problems, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.1071-1078, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00520841

J. Tsitsiklis, On the convergence of optimistic policy iteration, Journal of Machine Learning Research, vol.3, pp.59-72, 2002.

J. Tsitsiklis and B. Van-roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997.
DOI : 10.1109/9.580874

X. Xu, T. Xie, D. Hu, and X. Lu, Kernel least-squares temporal difference learning, International Journal of Information Technology, vol.11, issue.9, pp.54-63, 2005.

X. Xu, D. Hu, and X. Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007.
DOI : 10.1109/TNN.2007.899161

H. Yu, Convergence of least squares temporal difference methods under general conditions, Proceedings 27th International Conference on Machine Learning (ICML-10), pp.1207-1214, 2010.

H. Yu and D. Bertsekas, Convergence results for some temporal difference methods based on least squares, IEEE Transactions on Automatic Control, vol.54, issue.7, pp.1515-1531, 2009.

H. Yu and D. Bertsekas, Error Bounds for Approximations from Projected Linear Equations, Mathematics of Operations Research, vol.35, issue.2, pp.306-329, 2010.
DOI : 10.1287/moor.1100.0441