Gradient methods for stackelberg security games, 2016. ,
Interactive teaching strategies for agent training, 2016. ,
Computation of Stackelberg Equilibria of Finite Sequential Games, pp.201-215, 2015. ,
Approximate equivalence of markov decision processes In Learning Theory and Kernel Machines Lecture notes in Computer science, pp.581-594, 2003. ,
Networks of influence diagrams: A formalism for representing agentsâ ? A ´ Z beliefs and decision-making processes, Journal of Artificial Intelligence Research, vol.33, issue.1, pp.109-147, 2008. ,
Cooperative inverse reinforcement learning, 2016. ,
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Markovian decision processes with uncertain transition probabilities, Operations Research, vol.21, issue.3, pp.728-740, 1973. ,
On a Measure of the Information Provided by an Experiment, The Annals of Mathematical Statistics, vol.27, issue.4, pp.986-105, 1956. ,
DOI : 10.1214/aoms/1177728069
Algorithms for inverse reinforcement learning, Icml, pp.663-670, 2000. ,
Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Internal rewards mitigate agent boundedness, Proceedings of the 27th international conference on machine learning (ICML-10), pp.1007-1014, 2010. ,
Value-based policy teaching with active indirect elicitation, Proc. 23rd AAAI Conference on Artificial Intelligence (AAAI'08), pp.208-214, 2008. ,
Policy teaching through reward function learning, Proceedings of the tenth ACM conference on Electronic commerce, EC '09, pp.295-304, 2009. ,
DOI : 10.1145/1566374.1566417
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.1834
Cyclic equilibria in markov games, Advances in Neural Information Processing Systems, 2006. ,