A. Antos, C. Szepesvari, and R. Munos, Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008.
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201

T. Archibald, K. Mckinnon, and L. Thomas, On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995.
DOI : 10.1057/jors.1995.50

J. Baxter and P. L. Bartlett, Infinite-horizon gradient-based policy search, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001.

D. Bertsekas, Dynamic Programming and Optimal Control, 1995.

D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, 1996.

S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, Incremental natural actor-critic algorithms, Advances in neural information processing systems (nips), 2007.

A. Fern, S. Yoon, and R. Givan, Approximate Policy Iteration with a Policy Language Bias : Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research, vol.25, pp.75-118, 2006.

J. Fix and M. Geist, Monte-Carlo Swarm Policy Search, Symposium on Swarm Intelligence and Differential Evolution, 2012.
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540

M. Geist, Soft-max boosting, Machine Learning, pp.305-332, 2015.
DOI : 10.1007/s10994-015-5491-2
URL : https://hal.archives-ouvertes.fr/hal-01258816

M. Ghavamzadeh and A. Lazaric, Conservative and Greedy Approaches to Classification-based Policy Iteration, Conference on artificial intelligence (aaai), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00772610

V. Heidrich-meisner and C. Igel, Evolution Strategies for Direct Policy Search, International conference on parallel problem solving from nature, pp.428-437, 2008.
DOI : 10.1007/978-3-540-87700-4_43

S. Kakade, A Natural Policy Gradient, Advances in neural information processing systems (nips), 2001.

S. Kakade and J. Langford, Approximately optimal approximate reinforcement learning, International conference on machine learning (icml), 2002.

J. Kober and J. Peters, Policy Search for Motor Primitives in Robotics, Machine Learning, pp.171-203, 2011.

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

M. Lagoudakis and R. Parr, Reinforcement learning as classification : Leveraging modern classifiers, International conference on machine learning (icml), 2003.

A. Lazaric, M. Ghavamzadeh, and R. Munos, Analysis of a classificationbased policy iteration algorithm, International conference on machine learning (icml), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00482065

A. Lazaric, M. Ghavamzadeh, and R. Munos, Finite-sample analysis of least-squares policy iteration, Journal of Machine learning Research, vol.13, pp.3041-3074, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00528596

L. Mason, J. Baxter, P. Bartlett, and M. Frean, Boosting algorithms as gradient descent in function space (Rapport technique) Australian National University, 1999.

R. Munos, Error bounds for approximate policy iteration, International conference on machine learning (icml), 2003.

R. Munos, Performance bounds in Lp norm for approximate value iteration, SIAM Journal on Control and Optimization, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00124685

J. Peters and S. Schaal, Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008.
DOI : 10.1016/j.neucom.2007.11.026

M. L. Puterman, Markov decision processes : Discrete stochastic dynamic programming, 1994.
DOI : 10.1002/9780470316887

B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758882

B. Scherrer and B. Lesner, On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, Advances in neural information processing systems (nips), 2012.
URL : https://hal.archives-ouvertes.fr/hal-00758809

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. Sutton, D. Mcallester, S. Singh, and Y. Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in neural information processing systems (nips), 1999.