K. Amin, S. Singh, and M. Wellman, Gradient methods for stackelberg security games, 2016.

O. Amir, E. Kamar, A. Kolobov, and B. Grosz, Interactive teaching strategies for agent training, 2016.

B. Bo?ansk´bo?ansk´y, S. Brânzei, K. A. Hansen, P. B. Miltersen, and T. B. Sørensen, Computation of Stackelberg Equilibria of Finite Sequential Games, pp.201-215, 2015.

E. Even-dar and Y. Mansour, Approximate equivalence of markov decision processes In Learning Theory and Kernel Machines Lecture notes in Computer science, pp.581-594, 2003.

Y. Gal and A. Pfeffer, Networks of influence diagrams: A formalism for representing agentsâ ? A ´ Z beliefs and decision-making processes, Journal of Artificial Intelligence Research, vol.33, issue.1, pp.109-147, 2008.

D. Hadfield-menell, A. Dragan, P. Abbeel, and S. Russell, Cooperative inverse reinforcement learning, 2016.

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

R. E. Jay and K. Satia, Markovian decision processes with uncertain transition probabilities, Operations Research, vol.21, issue.3, pp.728-740, 1973.

D. V. Lindley, On a Measure of the Information Provided by an Experiment, The Annals of Mathematical Statistics, vol.27, issue.4, pp.986-105, 1956.
DOI : 10.1214/aoms/1177728069

A. Y. Ng and S. J. Russell, Algorithms for inverse reinforcement learning, Icml, pp.663-670, 2000.

M. L. Puterman, Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

J. Sorg, S. P. Singh, and R. L. Lewis, Internal rewards mitigate agent boundedness, Proceedings of the 27th international conference on machine learning (ICML-10), pp.1007-1014, 2010.

H. Zhang and D. C. Parkes, Value-based policy teaching with active indirect elicitation, Proc. 23rd AAAI Conference on Artificial Intelligence (AAAI'08), pp.208-214, 2008.

H. Zhang, D. C. Parkes, and Y. Chen, Policy teaching through reward function learning, Proceedings of the tenth ACM conference on Electronic commerce, EC '09, pp.295-304, 2009.
DOI : 10.1145/1566374.1566417

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.1834

M. Zinkevich, A. Greenwald, and M. Littman, Cyclic equilibria in markov games, Advances in Neural Information Processing Systems, 2006.