A. Antos, R. Munos, and C. Szepesvári, Fitted Q-iteration in continuous action-space MDPs, Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00185311

M. G. Azar, R. Munos, M. Ghavamzadeh, and H. J. Kappen, Reinforcement learning with a near optimal rate of convergence, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00636615

P. L. Bartlett and A. Tewari, REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.

D. P. Bertsekas, Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.
DOI : 10.1017/CBO9780511546921

E. Even-dar, S. Mannor, and Y. Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes, 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002.
DOI : 10.1007/3-540-45435-7_18

E. Even-dar and Y. Mansour, Learning Rates for Q-Learning, Journal of Machine Learning Research, vol.5, pp.1-25, 2003.
DOI : 10.1007/3-540-44581-1_39

W. Feller, An Introduction to Probability Theory and Its Applications, 1968.

T. Jaakkola, M. I. Jordan, and S. Singh, On the Convergence of Stochastic Iterative Dynamic Programming Algorithms, Neural Computation, vol.8, issue.6, pp.1185-1201, 1994.
DOI : 10.1214/aoms/1177729586

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

M. Kearns and S. Singh, Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems 12, pp.996-1002, 1999.

R. Munos and C. Szepesvári, Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

J. Peng and R. J. Williams, Incremental multi-step Q-learning, Machine Learning, pp.283-290, 1996.

A. L. Strehl, L. Li, and M. L. Littman, Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

. Cs and . Szepesvári, The asymptotic convergence-rate of Q-learning, Advances in Neural Information Processing Systems 10, 1997.

I. Szita and C. Szepesvári, Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, pp.1031-1038, 2010.

H. Van-hasselt, Double Q-learning, Advances in Neural Information Processing Systems 23, pp.2613-2621, 2010.

C. Watkins, Learning from Delayed Rewards, Kings College, 1989.