Fitted Q-iteration in continuous action-space MDPs, Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00185311
Reinforcement learning with a near optimal rate of convergence, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00636615
REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009. ,
Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
Prediction, Learning, and Games, 2006. ,
DOI : 10.1017/CBO9780511546921
PAC Bounds for Multi-armed Bandit and Markov Decision Processes, 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002. ,
DOI : 10.1007/3-540-45435-7_18
Learning Rates for Q-Learning, Journal of Machine Learning Research, vol.5, pp.1-25, 2003. ,
DOI : 10.1007/3-540-44581-1_39
An Introduction to Probability Theory and Its Applications, 1968. ,
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation, vol.8, issue.6, pp.1185-1201, 1994. ,
DOI : 10.1214/aoms/1177729586
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems 12, pp.996-1002, 1999. ,
Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Incremental multi-step Q-learning, Machine Learning, pp.283-290, 1996. ,
Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
The asymptotic convergence-rate of Q-learning, Advances in Neural Information Processing Systems 10, 1997. ,
Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, pp.1031-1038, 2010. ,
Double Q-learning, Advances in Neural Information Processing Systems 23, pp.2613-2621, 2010. ,
Learning from Delayed Rewards, Kings College, 1989. ,