Linear least-squares algorithms for temporal difference learning, Machine Learning, 1996. ,
DOI : 10.1007/bf00114723
URL : https://link.springer.com/content/pdf/10.1007%2FBF00114723.pdf
Model-Free reinforcement learning with continuous action in practice, 2012 American Control Conference (ACC), pp.2177-2182, 2012. ,
DOI : 10.1109/ACC.2012.6315022
URL : https://hal.archives-ouvertes.fr/hal-00764281
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, 2005. ,
Monte-Carlo Swarm Policy Search, Swarm and Evolutionary Computation, pp.75-83, 2012. ,
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540
Differentiation Under the Integral Sign, The American Mathematical Monthly, vol.80, issue.6, pp.615-627, 1973. ,
DOI : 10.2307/2319163
Discrete-time Markov control processes, 1996. ,
DOI : 10.1007/978-1-4612-0729-0
Approximately optimal approximate reinforcement learning, Proc. of ICML, pp.267-274, 2002. ,
Least-squares policy iteration, Journal of Machine Learning Research, 2003. ,
Pegasus: A policy search method for large mdps and pomdps, Proc. of UAI, pp.406-415, 2000. ,
Numerical optimization, 2006. ,
DOI : 10.1007/b98874
Policy Gradient Methods for Robotics, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2219-2225, 2006. ,
DOI : 10.1109/IROS.2006.282564
URL : http://www-clmc.usc.edu/publications/P/peters-IROS2006.pdf
Natural actor-critic, Proc. of ECML, pp.280-291, 2005. ,
DOI : 10.1007/11564096_29
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, European Conference on Machine Learning, pp.317-328, 2005. ,
DOI : 10.1007/11564096_32
URL : http://www.ni.uos.de/fileadmin/user_upload/publications/riedmiller.ecml2005.official.pdf
Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, Proc. of ECML, pp.35-50, 2014. ,
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079
Deterministic policy gradient algorithms, Proc. of ICML, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00938992
Policy gradient methods for reinforcement learning with function approximation, Proc. of NIPS, pp.1057-1063, 1999. ,