Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Infinite-horizon gradient-based policy search, Journal of Artificial Intelligence Research, vol.15, pp.319-350, 2001. ,
Dynamic Programming and Optimal Control, 1995. ,
Neuro-dynamic programming, 1996. ,
Incremental natural actor-critic algorithms, Advances in neural information processing systems (nips), 2007. ,
Approximate Policy Iteration with a Policy Language Bias : Solving Relational Markov Decision Processes, Journal of Artificial Intelligence Research, vol.25, pp.75-118, 2006. ,
Monte-Carlo Swarm Policy Search, Symposium on Swarm Intelligence and Differential Evolution, 2012. ,
DOI : 10.1007/978-3-642-29353-5_9
URL : https://hal.archives-ouvertes.fr/hal-00695540
Soft-max boosting, Machine Learning, pp.305-332, 2015. ,
DOI : 10.1007/s10994-015-5491-2
URL : https://hal.archives-ouvertes.fr/hal-01258816
Conservative and Greedy Approaches to Classification-based Policy Iteration, Conference on artificial intelligence (aaai), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00772610
Evolution Strategies for Direct Policy Search, International conference on parallel problem solving from nature, pp.428-437, 2008. ,
DOI : 10.1007/978-3-540-87700-4_43
A Natural Policy Gradient, Advances in neural information processing systems (nips), 2001. ,
Approximately optimal approximate reinforcement learning, International conference on machine learning (icml), 2002. ,
Policy Search for Motor Primitives in Robotics, Machine Learning, pp.171-203, 2011. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Reinforcement learning as classification : Leveraging modern classifiers, International conference on machine learning (icml), 2003. ,
Analysis of a classificationbased policy iteration algorithm, International conference on machine learning (icml), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Finite-sample analysis of least-squares policy iteration, Journal of Machine learning Research, vol.13, pp.3041-3074, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Boosting algorithms as gradient descent in function space (Rapport technique) Australian National University, 1999. ,
Error bounds for approximate policy iteration, International conference on machine learning (icml), 2003. ,
Performance bounds in Lp norm for approximate value iteration, SIAM Journal on Control and Optimization, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Natural Actor-Critic, Neurocomputing, vol.71, issue.7-9, pp.1180-1190, 2008. ,
DOI : 10.1016/j.neucom.2007.11.026
Markov decision processes : Discrete stochastic dynamic programming, 1994. ,
DOI : 10.1002/9780470316887
Approximate Modified Policy Iteration, International Conference on Machine Learning (ICML), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, Advances in neural information processing systems (nips), 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in neural information processing systems (nips), 1999. ,