S. Bradtke and A. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, 1996.
DOI : 10.1007/bf00114723
URL : https://link.springer.com/content/pdf/10.1007%2FBF00114723.pdf

T. Degris, P. Pilarski, and R. S. Sutton, Model-Free reinforcement learning with continuous action in practice, 2012 American Control Conference (ACC), pp.2177-2182, 2012.
DOI : 10.1109/ACC.2012.6315022
URL : https://hal.archives-ouvertes.fr/hal-00764281

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, 2005.

J. Fix and M. Geist, Monte-Carlo Swarm Policy Search, Swarm and Evolutionary Computation, pp.75-83, 2012.
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540

H. Flanders, Differentiation Under the Integral Sign, The American Mathematical Monthly, vol.80, issue.6, pp.615-627, 1973.
DOI : 10.2307/2319163

O. Hernández-lerma and J. B. Lasserre, Discrete-time Markov control processes, 1996.
DOI : 10.1007/978-1-4612-0729-0

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, Proc. of ICML, pp.267-274, 2002.

M. G. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, 2003.

A. Y. Ng and M. Jordan, Pegasus: A policy search method for large mdps and pomdps, Proc. of UAI, pp.406-415, 2000.

J. Nocedal and S. Wright, Numerical optimization, 2006.
DOI : 10.1007/b98874

J. Peters and S. Schaal, Policy Gradient Methods for Robotics, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2219-2225, 2006.
DOI : 10.1109/IROS.2006.282564
URL : http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf

J. Peters, S. Vijayakumar, and S. Schaal, Natural actor-critic, Proc. of ECML, pp.280-291, 2005.
DOI : 10.1007/11564096_29

M. Riedmiller, Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, European Conference on Machine Learning, pp.317-328, 2005.
DOI : 10.1007/11564096_32
URL : http://www.ni.uos.de/fileadmin/user_upload/publications/riedmiller.ecml2005.official.pdf

B. Scherrer and M. Geist, Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, Proc. of ECML, pp.35-50, 2014.
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra et al., Deterministic policy gradient algorithms, Proc. of ICML, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00938992

R. S. Sutton, D. A. Mcallester, S. P. Singh, and Y. Mansour, Policy gradient methods for reinforcement learning with function approximation, Proc. of NIPS, pp.1057-1063, 1999.