APRIL: Active Preference Learning-Based Reinforcement Learning, Proceedings ECMLPKDD 2012, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp.116-131, 2012. ,
DOI : 10.1007/978-3-642-33486-3_8
URL : https://hal.archives-ouvertes.fr/hal-00722744
Tuning Bandit Algorithms in Stochastic Environments, Proceedings of the Algorithmic Learning Theory, pp.150-165, 2007. ,
DOI : 10.1093/biomet/25.3-4.285
URL : https://hal.archives-ouvertes.fr/inria-00203487
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
Evolution strategies?a comprehensive introduction, Natural Computing, vol.1, issue.1, pp.3-52, 2002. ,
DOI : 10.1023/A:1015059928466
Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning, Proceedings ECMLPKDD 2011, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp.414-429, 2011. ,
DOI : 10.1007/978-3-642-23780-5_30
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.224.8007
Evolutionary algorithms for solving multi-objective problems, 2007. ,
DOI : 10.1007/978-1-4757-5184-0
PAC Bounds for Multi-armed Bandit and Markov Decision Processes, Proceedings of the 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002. ,
DOI : 10.1007/3-540-45435-7_18
Nontransitive measurable utility, Journal of Mathematical Psychology, vol.26, issue.1, pp.31-67, 1982. ,
DOI : 10.1016/0022-2496(82)90034-7
Preference-based reinforcement learning: a formal framework and a policy iteration algorithm, Machine Learning, vol.28, issue.1???2, pp.123-156, 2012. ,
DOI : 10.1007/s10994-012-5313-8
Evaluating the CMA Evolution Strategy on Multimodal Test Functions, Parallel Problem Solving from Nature-PPSN VIII, pp.282-291, 2004. ,
DOI : 10.1007/978-3-540-30217-9_29
Variable metric reinforcement learning methods applied to the noisy mountain car problem. Recent Advances in Reinforcement Learning pp, pp.136-150, 2008. ,
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.401-408, 2009. ,
DOI : 10.1145/1553374.1553426
Note on Wilcoxon's Two-Sample Test when Ties are Present, The Annals of Mathematical Statistics, vol.23, issue.1, pp.133-135, 1952. ,
DOI : 10.1214/aoms/1177729491
Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963. ,
DOI : 10.1214/aoms/1177730491
Pac subset selection in stochastic multi-armed bandits, Proceedings of the Twenty-ninth International Conference on Machine Learning, pp.655-662, 2012. ,
Reinforcement learning as classification: Leveraging modern classifiers, Proceedings of the 20th International Conference on Machine Learning, pp.424-431, 2003. ,
Analysis of a classificationbased policy iteration algorithm, Proceedings of the 27th International Conference on Machine Learning, pp.607-614, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Hoeffding races: accelerating model selection search for classification and function approximation, Advances in Neural Information Processing Systems, pp.59-66, 1994. ,
The Racing Algorithm: Model Selection for Lazy Learners, Artificial Intelligence Review, vol.5, issue.1, pp.193-225, 1997. ,
DOI : 10.1007/978-94-017-2053-3_8
Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008. ,
DOI : 10.1145/1390156.1390241
URL : https://hal.archives-ouvertes.fr/hal-00834983
Axioms of cooperative decision making, 1988. ,
DOI : 10.1017/CCOL0521360552
Markov decision processes: discrete stochastic dynamic programming, 1994. ,
DOI : 10.1002/9780470316887
On-line Q-learning using connectionist systems, 1994. ,
Approximation theorems of mathematical statistics, 1980. ,
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
Inequalities for the l1 deviation of the empirical distribution, 2003. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, vol.8, issue.3, pp.229-256, 1992. ,
The K-armed dueling bandits problem, Journal of Computer and System Sciences, vol.78, issue.5, pp.1538-1556, 2012. ,
DOI : 10.1016/j.jcss.2011.12.028
Reinforcement learning design for cancer clinical trials, Statistics in Medicine, vol.22, issue.1, pp.3294-3315, 2009. ,
DOI : 10.1002/sim.3720