Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, 16th European Conference on Machine Learning, pp.317-328, 2005. ,
DOI : 10.1007/11564096_32
Regularized fitted Q-iteration for planning in continuous-space markovian decision problems, Proceedings of American Control Conference (ACC), pp.725-730, 2009. ,
Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, vol.22, issue.1, pp.89-129, 2008. ,
DOI : 10.1007/s10994-007-5038-2
URL : https://hal.archives-ouvertes.fr/hal-00830201
Finitesample analysis of bellman residual minimization, Proceedings of the Second Asian Conference on Machine Learning (ACML), 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00830212
Regularized policy iteration, Advances in Neural Information Processing Systems 21, pp.441-448, 2009. ,
Regularization and feature selection in least-squares temporal difference learning, ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pp.521-528, 2009. ,
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning, IEEE Transactions on Neural Networks, vol.18, issue.4, pp.973-992, 2007. ,
DOI : 10.1109/TNN.2007.899161
Least squares SVM for least squares TD learning, Proc. 17th European Conference on Artificial Intelligence, pp.499-503, 2006. ,
Kernelized value function approximation for reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.1017-1024, 2009. ,
DOI : 10.1145/1553374.1553504
Proto-value functions, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.2169-2231, 2007. ,
DOI : 10.1145/1102351.1102421
iLSTD: Eligibility traces and convergence analysis, Advances in Neural Information Processing Systems 19, pp.441-448, 2007. ,
Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3), Athena Scientific, 1996. ,
Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, 2007. ,
DOI : 10.1137/040614384
Error bounds for approximate policy iteration, ICML 2003: Proceedings of the 20th Annual International Conference on Machine Learning, 2003. ,
Stochastic Optimal Control: The Discrete-Time Case, 1978. ,
Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998. ,
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Adam Krzy? zak, and Harro Walk. A Distribution-Free Theory of Nonparametric Regression, 2002. ,