A. Antos, C. Szepesvári, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008.
DOI : 10.1007/11776420_42
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, and L. Thomas, On the generation of Markov decision processes, Journal of the Operational Research Society, pp.354-361, 1995.

C. Leemon and . Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, International Conference on Machine Learning (ICML), pp.30-37, 1995.

S. Bhatnagar, S. Richard, M. Sutton, M. Ghavamzadeh, and . Lee, Natural actor???critic algorithms, Automatica, vol.45, issue.11, pp.2471-2482, 2009.
DOI : 10.1016/j.automatica.2009.07.008
URL : https://hal.archives-ouvertes.fr/hal-00840470

J. Steven, A. G. Bradtke, and . Barto, Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996.

D. Pucci, D. Farias, B. Van, and R. , The linear programming approach to approximate dynamic programming, Operations research, vol.51, issue.6, pp.850-865, 2003.

V. V. Desai, V. F. Farias, and C. C. Moallemi, Approximate Dynamic Programming via a Smoothed Linear Program, Operations Research, vol.60, issue.3, pp.655-674, 2012.
DOI : 10.1287/opre.1120.1044
URL : http://www.moallemi.com/ciamac/papers/salp-2009.pdf

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

A. Jerzy, B. Filar, and . Tolwinski, On the Algorithm of Pollatschek and Avi-ltzhak, Stochastic Games And Related Topics, pp.59-70, 1991.

G. Gordon, Stable Function Approximation in Dynamic Programming, International Conference on Machine Learning (ICML), 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2
URL : http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Gordon-1995.pdf

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), 2002.

G. Michail, R. Lagoudakis, and . Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classificationbased policy iteration algorithm, International Conference on Machine Learning (ICML), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

P. Timothy, . Lillicrap, J. Jonathan, A. Hunt, N. Pritzel et al., Continuous control with deep reinforcement learning, International Conference on Learning Representations (ICLR), 2016.

R. Hamid, C. Maei, S. Szepesvári, . Bhatnagar, S. Richard et al., Toward off-policy learning control with function approximation, International Conference on Machine Learning (ICML), 2010.

R. Munos, Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007.
DOI : 10.1137/040614384

J. Pérolat, B. Piot, M. Geist, B. Scherrer, and O. Pietquin, Softened Approximate Policy Iteration for Markov Games, International Conference on Machine Learning (ICML), 2016.

B. Piot, M. Geist, and O. Pietquin, Difference of Convex Functions Programming for Reinforcement Learning, Advances in Neural Information Processing Systems (NIPS), 2014.
URL : https://hal.archives-ouvertes.fr/hal-01104419

B. Scherrer, Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, International Conference on Machine Learning (ICML), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00537403

B. Scherrer, Approximate Policy Iteration Schemes: A Comparison, International Conference on Machine Learning (ICML), pp.1314-1322, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00989982

B. Scherrer and M. Geist, Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2014.
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079

J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, Trust region policy optimization, International Conference on Machine Learning (ICML), 2015.

S. Richard, . Sutton, A. David, . Mcallester, P. Satinder et al., Policy Gradient Methods for Reinforcement Learning with Function Approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999.