P. L. Bartlett and A. Tewari, REGAL: A regularization based algorithm for reinforcement learning in weakly-communicating MDPs, UAI 2009, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pp.35-42, 2009.

R. I. Brafman and M. Tennenholtz, R-max -a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2003.

M. Hutter, Feature Reinforcement Learning: Part I. Unstructured MDPs, Journal of Artificial General Intelligence, vol.1, issue.1, pp.3-24, 2009.
DOI : 10.2478/v10229-011-0002-8

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.99, pp.1563-1600, 2010.

M. Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, Machine Learning, pp.209-232, 2002.

O. Maillard, R. Munos, R. , and D. , Selecting the state-representation in reinforcement learning, Advances in Neural Information Processing Systems, pp.2627-2635, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00639483

R. A. Mccallum, Reinforcement Learning with Selective Perception and Hidden State, 1996.

R. Ortner and D. Ryabko, Online regret bounds for undiscounted continuous reinforcement learning, Advances in Neural Information Processing Systems, pp.1772-1780, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00765441

M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

D. Ryabko and H. Hutter, On the possibility of learning in reactive environments with arbitrary dependence, Theoretical Computer Science, vol.405, issue.3, pp.274-284, 2008.
DOI : 10.1016/j.tcs.2008.06.039

URL : https://hal.archives-ouvertes.fr/hal-00639569

S. P. Singh, M. R. James, and M. R. Rudary, Predictive state representations: A new theory for modeling dynamical systems, UAI '04, Proceedings of the 20th Conference in Uncertainty in Artificial Intelligence, pp.512-518, 2004.

A. L. Strehl, L. Li, . Wiewiora, . Eric, J. Langford et al., PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.881-888, 2006.
DOI : 10.1145/1143844.1143955

J. Veness, K. S. Ng, M. Hutter, W. Uther, and D. Silver, A Monte-Carlo AIXI approximation, Journal of Artificial Intelligence Research, vol.40, issue.1, pp.95-142, 2011.

E. Vidal, F. Thollard, C. D. Higuera, F. Casacuberta, and R. C. Carrasco, Probabilistic finite-state machines - part I, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.7, pp.1013-1025, 2005.
DOI : 10.1109/TPAMI.2005.147

URL : https://hal.archives-ouvertes.fr/ujm-00326243