Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.5034
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Stochastic Optimal Control (The Discrete Time Case), 1978. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Regularized policy iteration, Proceedings of Advances in Neural Information Processing Systems 21, pp.441-448, 2008. ,
Error propagation for approximate policy and value iteration, Advances in Neural Information Processing Systems, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830154
A distribution-free theory of nonparametric regression, 2002. ,
DOI : 10.1007/b97848
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Finite-sample analysis of LSTD, Proceedings of the 27th International Conference on Machine Learning, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482189
Error bounds for approximate policy iteration, 19th International Conference on Machine Learning, pp.560-567, 2003. ,
Performance bounds in Lp norm for approximate value iteration, SIAM J. Control and Optimization, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Finite time bounds for sampling based fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics), 2007. ,
Markov Decision Processes ? Discrete Stochastic Dynamic Programming, 1994. ,
Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, Proceedings of the 27th International Conference on Machine Learning, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Generalized polynomial approximations in Markovian decision processes, Journal of Mathematical Analysis and Applications, vol.110, issue.2, pp.568-582, 1985. ,
DOI : 10.1016/0022-247X(85)90317-8
Handbook of Learning and Approximate Dynamic Programming, 2004. ,
DOI : 10.1109/9780470544785
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Learning from Delayed Rewards. PhD thesis, King's College, 1989. ,
Tight performance bounds on greedy policies based on imperfect value functions, Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, 1994. ,