APRIL: active preference-learning based reinforcement learning, 1208. ,
Stochastic Approximations and Differential Inclusions, Part II: Applications, Mathematics of Operations Research, vol.31, issue.4, pp.673-695, 2006. ,
DOI : 10.1287/moor.1060.0213
Axiomatization of a Preference for Most Probable Winner, Theory and Decision, vol.16, issue.1, pp.17-33, 2006. ,
DOI : 10.1007/s11238-005-4753-z
Stochastic approximation with two time scales, Systems & Control Letters, vol.29, issue.5, pp.291-294, 1997. ,
DOI : 10.1016/S0167-6911(97)90015-3
Stochastic approximation, Resonance, vol.8, issue.s.471012, 2008. ,
DOI : 10.1007/s12045-013-0136-x
Iterative solution of games by fictitious play. Activity Analysis of Production and Allocation, 1951. ,
Matrix games, Linear programming , chapter 15, pp.228-239, 1983. ,
SSB Utility theory: an economic perspective, Mathematical Social Sciences, vol.8, issue.1, pp.63-94, 1984. ,
DOI : 10.1016/0165-4896(84)90061-1
Nontransitive preferences in decision theory, Journal of Risk and Uncertainty, vol.17, issue.2, pp.113-134, 1991. ,
DOI : 10.1007/BF00056121
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, pp.123-156, 2012. ,
DOI : 10.1007/s10994-012-5313-8
Solving MDPs with Skew Symmetric Bilinear Utility Functions, IJCAI, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01212802
Reducing the Number of Queries in Interactive Value Iteration, ADT, pp.139-152, 2015. ,
DOI : 10.1007/978-3-319-23114-3_9
URL : https://hal.archives-ouvertes.fr/hal-01213280
Stability for the best response dynamics. Working Paper, 1995. ,
Risk attitudes for nonlinear measurable utility, Annals of Operations Research, vol.49, issue.1, pp.311-333, 1989. ,
DOI : 10.1007/BF02283527
Dynamic programming analysis of the TV game ???Who wants to be a millionaire????, European Journal of Operational Research, vol.183, issue.2, pp.805-811, 2007. ,
DOI : 10.1016/j.ejor.2006.10.041
An optimal single-winner preferential voting system based on game theory, Proceedings Third International Workshop on Computational Social Choice, 2010. ,
Theory of games and economic behavior, 1947. ,
Interactive value iteration for Markov decision processes with unknown rewards, International Joint Conference in Artificial Intelligence, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00942290
Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor, ECML/PKDD Workshop Reinforcement Learning with Generalized Feedback, 2013. ,
A Bayesian approach for policy learning from trajectory preference queries, Advances in Neural Information Processing Systems 25, pp.1133-1141, 2012. ,
EPMC: every visit preference Monte Carlo for reinforcement learning, Asian Conference on Machine Learning, ACML 2013, pp.483-497, 2013. ,
Model-free preference-based reinforcement learning, 2015. ,