Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
DOI : 10.1007/11776420_42
URL : https://hal.archives-ouvertes.fr/hal-00830201
On the generation of Markov decision processes, Journal of the Operational Research Society, pp.354-361, 1995. ,
Residual Algorithms: Reinforcement Learning with Function Approximation, International Conference on Machine Learning (ICML), pp.30-37, 1995. ,
Natural actor???critic algorithms, Automatica, vol.45, issue.11, pp.2471-2482, 2009. ,
DOI : 10.1016/j.automatica.2009.07.008
URL : https://hal.archives-ouvertes.fr/hal-00840470
Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
The linear programming approach to approximate dynamic programming, Operations research, vol.51, issue.6, pp.850-865, 2003. ,
Approximate Dynamic Programming via a Smoothed Linear Program, Operations Research, vol.60, issue.3, pp.655-674, 2012. ,
DOI : 10.1287/opre.1120.1044
URL : http://www.moallemi.com/ciamac/papers/salp-2009.pdf
Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
On the Algorithm of Pollatschek and Avi-ltzhak, Stochastic Games And Related Topics, pp.59-70, 1991. ,
Stable Function Approximation in Dynamic Programming, International Conference on Machine Learning (ICML), 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
URL : http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Gordon-1995.pdf
Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), 2002. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Analysis of a classificationbased policy iteration algorithm, International Conference on Machine Learning (ICML), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Continuous control with deep reinforcement learning, International Conference on Learning Representations (ICLR), 2016. ,
Toward off-policy learning control with function approximation, International Conference on Machine Learning (ICML), 2010. ,
Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007. ,
DOI : 10.1137/040614384
Softened Approximate Policy Iteration for Markov Games, International Conference on Machine Learning (ICML), 2016. ,
Difference of Convex Functions Programming for Reinforcement Learning, Advances in Neural Information Processing Systems (NIPS), 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01104419
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, International Conference on Machine Learning (ICML), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Approximate Policy Iteration Schemes: A Comparison, International Conference on Machine Learning (ICML), pp.1314-1322, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00989982
Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2014. ,
DOI : 10.1007/978-3-662-44845-8_3
URL : https://hal.archives-ouvertes.fr/hal-01091079
Trust region policy optimization, International Conference on Machine Learning (ICML), 2015. ,
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Neural Information Processing Systems (NIPS), pp.1057-1063, 1999. ,