A. Antos, V. Grover, and C. Szepesvári, Active learning in heteroscedastic noise, Theoretical Computer Science, vol.411, issue.29-30, pp.29-302712, 2010.
DOI : 10.1016/j.tcs.2010.04.007

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

X. Cao, Stochastic Learning and Optimization: A Sensitivity-Based Approach, 2007.

N. Cesa-bianchi, A. Conconi, and C. Gentile, On the Generalization Ability of On-Line Learning Algorithms, IEEE Transactions on Information Theory, vol.50, issue.9, pp.2050-2057, 2004.
DOI : 10.1109/TIT.2004.833339

E. Even-dar, S. M. Kakade, and Y. Mansour, Experts in a Markov decision process, Advances in Neural Information Processing Systems 17, pp.401-408, 2005.

E. Even-dar, S. M. Kakade, and Y. Mansour, Online Markov Decision Processes, Mathematics of Operations Research, vol.34, issue.3, pp.726-736, 2009.
DOI : 10.1287/moor.1090.0396

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.8186

A. György, T. Linder, G. Lugosi, and G. Ottucsák, The on-line shortest path problem under partial monitoring, Journal of Machine Learning Research, vol.8, pp.2369-2403, 2007.

I. C. Ipsen and T. M. Selee, Ergodicity Coefficients Defined by Vector Norms, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.1, pp.153-200, 2011.
DOI : 10.1137/090752948

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.99, pp.1563-1600, 2010.

S. Kakade, K. Sridharan, and A. Tewari, On the complexity of linear prediction: Risk bounds, margin bounds, and regularization, Advances in Neural Information Processing Systems 22, pp.793-800, 2009.

A. Lazaric and R. Munos, Learning with stochastic inputs and adversarial outputs, Journal of Computer and System Sciences, vol.78, issue.5, 2011.
DOI : 10.1016/j.jcss.2011.12.027

URL : https://hal.archives-ouvertes.fr/hal-00772046

G. Neu, A. György, and C. Szepesvári, The online loop-free stochastic shortest-path problem, Proceedings of the 23rd Annual Conference on Learning Theory, pp.231-243, 2010.

G. Neu, A. György, and C. Szepesvári, The adversarial stochastic shortest path problem with unknown transition probabilities, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics Conference Proceedings, pp.805-813, 2012.

G. Neu, A. György, and C. Szepesvári, The online loop-free stochastic shortest-path problem, 2013.

J. Lafferty, C. Williams, J. Shawe-taylor, R. Zemel, and A. Culotta, Online Markov decision processes under bandit feedback, Advances in Neural Information Processing Systems 23, pp.1804-1812

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

A. Rakhlin, K. Sridharan, A. Tewari, J. Shawe-taylor, R. S. Zemel et al., Online learning: Stochastic and constrained adversaries, Advances in Neural Information Processing Systems, p.2752, 2011.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

J. Y. Yu and S. Mannor, Arbitrarily modulated Markov decision processes, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pp.2946-2953, 2009.
DOI : 10.1109/CDC.2009.5400662

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.9140

J. Y. Yu and S. Mannor, Online learning in Markov decision processes with arbitrarily changing rewards and transitions, 2009 International Conference on Game Theory for Networks, pp.314-322, 2009.
DOI : 10.1109/GAMENETS.2009.5137416

J. Y. Yu, S. Mannor, and N. Shimkin, Markov Decision Processes with Arbitrary Reward Processes, Mathematics of Operations Research, vol.34, issue.3, pp.737-757, 2009.
DOI : 10.1287/moor.1090.0397