Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.89-129, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Dynamic Policy Programming with Function Approximation, 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 2011. ,
Approximate policy iteration: a survey and some new methods, Journal of Control Theory and Applications, vol.27, issue.3, pp.310-335, 2011. ,
DOI : 10.1007/s11768-011-1005-3
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.294.8653
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Leastsquares methods for Policy Iteration, Reinforcement Learning: State of the Art, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00830122
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, 2005. ,
Planning in pomdps using multiplicity automata, Uncertainty in Artificial Intelligence (UAI, pp.185-192, 2005. ,
Regularized policy iteration, Advances in Neural Information Processing Systems, pp.441-448, 2009. ,
Error propagation for approximate policy and value iteration (extended version), NIPS, 2010. ,
Classification-based Policy Iteration with a Critic, International Conference on Machine Learning (ICML), pp.1049-1056, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00590972
Stable Function Approximation in Dynamic Programming, ICML, pp.261-268, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
Max-norm projections for factored MDPs, International Joint Conference on Artificial Intelligence, pp.673-682, 2001. ,
Efficient Solution Algorithms for Factored MDPs, Journal of Artificial Intelligence Research (JAIR), vol.19, pp.399-468, 2003. ,
On the Sample Complexity of Reinforcement Learning, 2003. ,
Approximately Optimal Approximate Reinforcement Learning, International Conference on Machine Learning (ICML), pp.267-274, 2002. ,
Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003. ,
Finite-Sample Analysis of Least-Squares Policy Iteration, To appear in Journal of Machine learning Research, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Finite Sample Analysis of Bellman Residual Minimization, Asian Conference on Machine Learpning. JMLR: Workshop and Conference Proceedings, pp.309-324, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830212
Error Bounds for Approximate Policy Iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003. ,
Performance Bounds in Lp norm for Approximate Value Iteration, SIAM J. Control and Optimization, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Finite time bounds for sampling based fitted value iteration, Journal of Machine Learning Research (JMLR), vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Biasing Approximate Dynamic Programming with a Lower Discount Factor, Twenty-Second Annual Conference on Neural Information Processing Systems -NIPS 2008, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00337652
Point-based value iteration: An anytime algorithm for POMDPs, International Joint Conference on Artificial Intelligence, pp.1025-1032, 2003. ,
Markov Decision Processes, 1994. ,
DOI : 10.1002/9780470316887
An upper bound on the loss from approximate optimal-value functions, Machine Learning, pp.16-3227, 1994. ,
DOI : 10.1007/BF00993308
Least-Squares ? Policy Iteration: Bias-Variance Trade-off in Control Problems, International Conference on Machine Learning, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00520841
Feature-Based Methods for Large Scale Dynamic Programming, Machine Learning, pp.59-94, 1996. ,