M. Azar, R. Munos, M. Ghavamzadeh, and H. Kappen, Reinforcement Learning with a Near Optimal Rate of Convergence, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00636615

M. Azar, R. Munos, M. Ghavamzadeh, and H. Kappen, Speedy q-learning, Advances in Neural Information Processing Systems 24, pp.2411-2419, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00830140

P. Bartlett and A. Tewari, REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs Dynamic Programming and Optimal Control Neuro-Dynamic Programming, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence Bertsekas DP Prediction, Learning, and Games, 1996.

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

L. Hagerup and C. Rüb, A guided tour of chernoff bounds, Information Processing Letters, vol.33, issue.6, pp.305-308, 1990.
DOI : 10.1016/0020-0190(90)90214-I

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

S. Kakade, On the sample complexity of reinforcement learning Gatsby Computational Neuroscience Unit Kearns M, Singh S (1999) Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems, pp.996-1002, 2004.

T. Lattimore and M. Hutter, PAC Bounds for Discounted MDPs, p.3890, 2012.
DOI : 10.1007/978-3-642-34106-9_26

S. Mannor and J. Tsitsiklis, The sample complexity of exploration in the multiarmed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

R. Munos and A. Moore, Influence and variance of a Markov chain : Application to adaptive discretizations in optimal control An upper bound on the loss from approximate optimalvalue functions, Proceedings of the 38th IEEE Conference on Decision and Control Singh SP, pp.227-233, 1994.

M. Sobel, The variance of discounted Markov decision processes, Journal of Applied Probability, vol.7, issue.04, pp.794-802, 1982.
DOI : 10.2307/1913656

A. Strehl, L. Li, and M. Littman, Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szepesvári, Algorithms for Reinforcement Learning Szita I, Szepesvári C (2010) Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, Omnipress, pp.1031-1038, 2010.

M. Wiering and M. Van-otterlo, Reinforcement Learning: State-of-the-Art, pp.3-39, 2012.
DOI : 10.1007/978-3-642-27645-3