Y. Andrew, S. Ng, and . Russell, Algorithms for inverse reinforcement learning, Proc. 17th International Conf. on Machine Learning, pp.663-670, 2000.

S. Schaal, A. Ijspeert, and A. Billard, Computational approaches to motor learning by imitation, Philosophical Transactions of the Royal Society B: Biological Sciences, vol.358, issue.1431, pp.537-547, 1431.
DOI : 10.1098/rstb.2002.1258

URL : http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1693137

G. Konidaris, S. Kuindersma, A. Barto, and R. Grupen, Constructing skill trees for reinforcement learning agents from demonstration trajectories, Advances in Neural Information Processing Systems, 2010.

A. Wilson, A. Fern, and P. Tadepalli, A bayesian approach for policy learning from trajectory preference queries, Advances in Neural Information Processing Systems, 2012.

R. Akrour, M. Schoenauer, and M. Sebag, Preference-Based Policy Learning, Eur Conf on Machine Learning, 2011.
DOI : 10.1007/978-3-642-23780-5_11

URL : https://hal.archives-ouvertes.fr/inria-00625001

A. Jain, B. Wojcik, T. Joachims, and A. Saxena, Learning trajectory preferences for manipulators via iterative improvement, Neural Information Processing Systems, 2013.

S. Richard, A. G. Sutton, and . Barto, Introduction to Reinforcement Learning, 1998.

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

P. Abbeel, Apprenticeship Learning and Reinforcement Learning with Application to Robotic Control, 2008.

W. B. Knox, P. Stone, and C. Breazeal, Training a Robot via Human Feedback: A Case Study, Social Robotics, 2013.
DOI : 10.1007/978-3-319-02675-6_46

D. Ernst, P. Geurts, and L. Wehenkel, Tree-based batch mode reinforcement learning, J. Mach. Learn. Res, vol.6, pp.503-556, 2005.

R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, Advances in Neural Information Processing Systems, 1996.

T. Joachims, Training linear SVMs in linear time, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '06, pp.217-226, 2006.
DOI : 10.1145/1150402.1150429

J. Randlov and P. Alstrom, Learning to drive a bicycle using reinforcement learning and shaping, 1998.

C. J. Burges, R. Ragno, and Q. V. Le, Learning to rank with non-smooth cost functions, Advances in Neural Information Processing Systems 19, 2007.

B. Kim, J. Amir-massoud-farahmand, D. Pineau, and . Precup, Learning from limited demonstrations, NIPS, pp.2859-2867, 2013.