Speedy q-learning, Advances in Neural Information Processing Systems 24, pp.2411-2419, 2011. ,

URL : https://hal.archives-ouvertes.fr/hal-00830140

Reinforcement learning with a near optimal rate of convergence ,

URL : https://hal.archives-ouvertes.fr/inria-00636615

REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009. ,

Neuro-Dynamic Programming, Athena Scientific, 1996. ,

Reinforcement Learning and Dynamic Programming Using Function Approximators, 2010. ,

Prediction, Learning, and Games, 2006. ,

DOI : 10.1017/CBO9780511546921

Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,

On the Sample Complexity of Reinforcement Learning, 2004. ,

Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems 12, pp.996-1002, 1999. ,

The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004. ,

Influence and variance of a Markov chain: application to adaptive discretization in optimal control, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304), 1999. ,

DOI : 10.1109/CDC.1999.830188

The variance of discounted Markov decision processes, Journal of Applied Probability, vol.7, issue.04, pp.794-802, 1982. ,

DOI : 10.2307/1913656

Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009. ,

Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,

DOI : 10.1109/TNN.1998.712192

Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,

DOI : 10.2200/S00268ED1V01Y201005AIM009

Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, pp.1031-1038, 2010. ,