C. Chao, M. Cakmak, and A. Thomaz, Transparent active learning for robots, Human-Robot Interaction (HRI), 2010 5th ACM/IEEE Inter. Conf. on, pp.317-324, 2010.
DOI : 10.1145/1734454.1734562
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.419.2127

S. Chernova and M. Veloso, Interactive policy learning through confidence-based autonomy, J. Artificial Intelligence Research, vol.34, pp.1-25, 2009.

H. R. Chinaei and B. Chaib-draa, Learning Dialogue POMDP Models from Data, Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence , Canadian AI'11, pp.86-91, 2011.
DOI : 10.1016/j.csl.2006.06.008

B. Clement, P. Oudeyer, D. Roy, and M. Lopes, Online optimization of teaching sequences with multi-armed bandits, International Conference on Educational Data Mining, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016428

L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin et al., Uncertainty management for on-line optimisation of a POMDP-based largescale spoken dialogue system, Proceedings of Interspeech 2011, pp.1301-1304, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652194

L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin, A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization, IEEE Journal of Selected Topics in Signal Processing, vol.6, issue.8, pp.891-902, 2012.
DOI : 10.1109/JSTSP.2012.2229257

L. Daubigney, M. Geist, and O. Pietquin, Model-free POMDP optimisation of tutoring systems with echostate networks, Proceedings of SIGDial 2013, pp.102-106, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00869773

L. Daubigney, M. Geist, and O. Pietquin, Particle Swarm Optimisation of Spoken Dialogue System Strategies, Proceedings of Interspeech 2013, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00916935

R. Dillmann, Teaching and learning of robot tasks via observation of human performance, Robotics and Autonomous Systems, vol.47, issue.2-3, pp.109-116, 2004.
DOI : 10.1016/j.robot.2004.03.005

R. Dillmann, O. Rogalla, M. Ehrenmann, R. Zollner, and M. Bordegoni, Learning Robot Behaviour and Skills Based on Human Demonstration and Advice: The Machine Learning Paradigm, Inter. Symposium on Robotics Research (ISRR), pp.229-238, 2000.
DOI : 10.1007/978-1-4471-1555-7

F. Doshi, J. Pineau, and N. Roy, Reinforcement learning with limited reinforcement, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.256-263, 2008.
DOI : 10.1145/1390156.1390189

L. Asri, R. Laroche, and O. Pietquin, Reward Shaping for Statistical Optimisation of Dialogue Management, Proceedings of the International Conference on Statistical Language and Speech Processing Thorpe, and C. Baur. Robot, asker of questions. Robotics and Autonomous systems, pp.93-101, 2003.
DOI : 10.1007/978-3-642-39593-2_8
URL : https://hal.archives-ouvertes.fr/hal-00869809

M. E. Foster, S. Keizer, Z. Wang, and O. Lemon, Machine learning of social states and skills for multiparty human-robot interaction, Proceedings of the workshop on Machine Learning for Interactive Systems, p.9, 2012.

R. Freedman, Atlas : A plan manager for mixedinitiative , multimodal dialogue, Proceedings of the AAAI-99 Workshop on Mixed-Initiative Intelligence, pp.1-8, 1999.

M. Gasic, F. Jurcicek, S. Keizer, F. Mairesse, B. Thomson et al., Gaussian processes for fast policy optimisation of pomdp-based dialogue managers, Proceedings of SIGDIAL'10, 2010.

M. Ga?i´cga?i´c, F. Jur?i?ek, B. Thomson, K. Yu, and S. Young, On-line policy optimisation of spoken dialogue systems via live interaction with human subjects, Proceedings of ASRU, pp.312-317, 2011.

N. Golovin and E. Rahm, Reinforcement learning architecture for Web recommendations, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., pp.398-402, 2004.
DOI : 10.1109/ITCC.2004.1286487

G. Gordon, Stable Function Approximation in Dynamic Programming, ICML'95
DOI : 10.1016/B978-1-55860-377-6.50040-2

J. Grizou, I. Iturrate, L. Montesano, P. Oudeyer, and M. Lopes, Calibration-free bci based control, AAAI'14, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00984068

D. Grollman and O. Jenkins, Dogged Learning for Robots, Proceedings 2007 IEEE International Conference on Robotics and Automation, pp.2483-2488, 2007.
DOI : 10.1109/ROBOT.2007.363692
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.179.3071

J. Henderson, O. Lemon, and K. Georgila, Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets, Computational Linguistics, vol.16, issue.4, 2008.
DOI : 10.1098/rsta.2000.0593

A. Iglesias, P. Martínez, R. Aler, and F. Fernández, Learning teaching strategies in an Adaptive and Intelligent Educational System through Reinforcement Learning, Applied Intelligence, vol.5, issue.1, pp.89-106, 2009.
DOI : 10.1007/s10489-008-0115-1

F. Jelinek, Statistical Methods for Speech Recognition . Language, Speech and Communications Series, 1997.

F. Jelinek, L. Bahl, and R. Mercer, Design of a linguistic statistical decoder for the recognition of continuous speech, IEEE Transactions on Information Theory, vol.21, issue.3, pp.250-256, 1975.
DOI : 10.1109/TIT.1975.1055384

K. Judah, A. Fern, and T. Dietterich, Active imitation learning via reduction to iid active learning, UAI, 2012.

K. Judah, S. Roy, A. Fern, and T. Dietterich, Reinforcement learning via practice and critique advice, AAAI Conf. on Artificial Intelligence (AAAI-10), 2010.

F. Jurcicek, B. Thomson, S. Keizer, M. Gasic, F. Mairesse et al., Natural Belief-Critic : a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems, Proceedings of Interspeech'10, Makuhari (Japan), 2010.

E. Klein, M. Geist, B. Piot, and O. Pietquin, Inverse reinforcement learning through structured classification, pp.1-9, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00778624

E. Klein, B. Piot, M. Geist, and O. Pietquin, A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning, Proceedings of ECML/PKDD 2013, pp.1-16, 2013.
DOI : 10.1007/978-3-642-40988-2_1
URL : https://hal.archives-ouvertes.fr/hal-00869804

W. Knox and P. Stone, Interactively shaping agents via human reinforcement, Proceedings of the fifth international conference on Knowledge capture, K-CAP '09, pp.9-16, 2009.
DOI : 10.1145/1597735.1597738

W. Knox and P. Stone, Combining manual feedback with subsequent mdp reward signals for reinforcement learning, 9th Inter. Conf. on Autonomous Agents and Multiagent Systems (AAMAS'10), pp.5-12, 2010.

P. Korupolu, V. N. , M. Sivamurugan, and B. Ravindran, Instructing a reinforcement learner, Twenty- Fifth Inter. FLAIRS Conf, 2012.

S. Koyama, S. M. Chase, A. S. Whitford, M. Velliste, A. B. Schwartz et al., Comparison of brain???computer interface decoding algorithms in open-loop and closed-loop control, Journal of Computational Neuroscience, vol.79, issue.1-2, pp.73-87, 2010.
DOI : 10.1007/s10827-009-0196-9

M. Lagoudakis and R. Parr, Least-squares policy iteration, Journal of Machine Learning Research, 2003.

O. Lemon and O. Pietquin, Machine learning for spoken dialogue systems, Proceedings of Interspee- ch'07, pp.2685-2688, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00216035

E. Levin, R. Pieraccini, and W. Eckert, Learning dialogue strategies within the Markov decision process framework, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp.72-79, 1997.
DOI : 10.1109/ASRU.1997.658989

E. Levin, R. Pieraccini, and W. Eckert, Using Markov decision process for learning dialogue strategies, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), pp.201-204, 1998.
DOI : 10.1109/ICASSP.1998.674402

E. Levin, R. Pieraccini, W. Eckert, L. Li, S. Balakrishnan et al., A stochastic model of human-machine interaction for learning dialog strategies, InterSpeech'09, pp.11-23, 2000.
DOI : 10.1109/89.817450

M. L. Littman, Friend-or-foe q-learning in generalsum games, ICML, pp.322-328, 2001.

A. Lockerd and C. Breazeal, Tutelage and socially guided robot learning, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), pp.3475-3480, 2004.
DOI : 10.1109/IROS.2004.1389954
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.2165

M. Lopes, T. Cederborg, and P. Oudeyer, Simultaneous acquisition of task and feedback models, 2011 IEEE International Conference on Development and Learning (ICDL), 2011.
DOI : 10.1109/DEVLRN.2011.6037359
URL : https://hal.archives-ouvertes.fr/hal-00636166

M. Lopes, F. S. Melo, and L. Montesano, Active Learning for Reward Estimation in Inverse Reinforcement Learning, Machine Learning and Knowledge Discovery in Databases (ECML, 2009.
DOI : 10.1007/978-3-642-04174-7_3

K. Mase, Recognition of facial expression from optical flow, IEICE transactions, vol.74, issue.10, pp.3473-3483, 1991.

M. Mason and M. Lopes, Robot self-initiative and personalization by learning through repeated interactions, Proceedings of the 6th international conference on Human-robot interaction, HRI '11, 2011.
DOI : 10.1145/1957656.1957814
URL : https://hal.archives-ouvertes.fr/hal-00636164

F. S. Melo and M. Lopes, Learning from Demonstration Using MDP Induced Metrics, Machine learning and knowledge discovery in databases (ECML/PKDD'10), 2010.
DOI : 10.1007/978-3-642-15883-4_25

O. Mihatsch and R. Neuneier, Risk-sensitive reinforcement learning, Machine Learning, vol.49, issue.2/3, pp.267-290, 2002.
DOI : 10.1023/A:1017940631555

G. Neu and C. Szepesvári, Training parsers by inverse reinforcement learning, Machine Learning, vol.285, issue.5, pp.303-337, 2009.
DOI : 10.1007/s10994-009-5110-1
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.3712

A. Y. Ng and S. Russell, Algorithms for inverse reinforcement learning, Proceedings of ICML 2000, pp.663-670, 2000.

M. Nicolescu and M. Mataric, Learning and interacting in human-robot domains. Systems, Man and Cybernetics , Part A : Systems and Humans, IEEE Transactions on, issue.5, pp.31419-430, 2001.

R. Niewiadomski, J. Hofmann, J. Urbain, T. Platt, J. Wagner et al., Laugh-aware virtual agent and its impact on user amusement, Proceedings of AAMAS2013, pp.619-626, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00869751

T. Ogata, N. Masago, S. Sugano, and J. Tani, Interactive learning in human-robot collaboration, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), pp.162-167, 2003.
DOI : 10.1109/IROS.2003.1250622

J. P. Olive and N. Spickenagel, Speech resynthesis from phoneme???related parameters, The Journal of the Acoustical Society of America, vol.59, issue.4, pp.993-996, 1976.
DOI : 10.1121/1.380927

T. Paek, Reinforcement learning for spoken dialogue systems : Comparing strengths and weaknesses for practical deployment, Proceedings of the Interspeech Dialog-on-Dialog Workshop, 2006.

T. Paek and R. Pieraccini, Automating spoken dialogue management design using machine learning: An industry perspective, Speech Communication, vol.50, issue.8-9, pp.716-729, 2008.
DOI : 10.1016/j.specom.2008.03.010

R. Pieraccini, E. Levin, and E. Vidal, Learning how to understand language, Proceedings of Eurospee- ch'93, pp.1407-1412, 1993.

O. Pietquin, Consistent Goal-Directed User Model for Realisitc Man-Machine Task-Oriented Spoken Dialogue Simulation, 2006 IEEE International Conference on Multimedia and Expo, pp.425-428, 2006.
DOI : 10.1109/ICME.2006.262563

O. Pietquin, Inverse reinforcement learning for interactive systems, Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems Bridging the Gap Between Perception, Action and Communication, MLIS '13, pp.71-75, 2013.
DOI : 10.1145/2493525.2493529
URL : https://hal.archives-ouvertes.fr/hal-00869812

O. Pietquin, L. Daubigney, and M. Geist, Optimization of a tutoring system from a fixed set of data, Proceedings of the ISCA workshop on Speech and Language Technology in Education, pp.1-4, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00652324

O. Pietquin and T. Dutoit, Dynamic Bayesian Networks for NLU Simulation with Applications to Dialog Optimal Strategy Learning, 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, pp.49-52, 2006.
DOI : 10.1109/ICASSP.2006.1659954

O. Pietquin and T. Dutoit, A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006.
DOI : 10.1109/TSA.2005.855836
URL : https://hal.archives-ouvertes.fr/hal-00207952

O. Pietquin, M. Geist, and S. Chandramohan, Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences, Proceedings of IJCAI 2011, pp.1878-1883, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00618252

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, 2011.
DOI : 10.1145/1966407.1966412
URL : https://hal.archives-ouvertes.fr/hal-00617517

O. Pietquin and H. Hastie, A survey on metrics for the evaluation of user simulations, The Knowledge Engineering Review, vol.11, issue.01, pp.59-73, 2013.
DOI : 10.1016/j.csl.2009.03.002
URL : https://hal.archives-ouvertes.fr/hal-00771654

O. Pietquin and S. Renals, ASR System Modeling For Automatic Evaluation And Optimization of Dialogue Systems, Proceedings of ICASSP 2002, pp.45-48, 2002.

O. Pietquin, F. Tango, and R. Aras, Batch reinforcement learning for optimizing longitudinal driving assistance strategies, 2011 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS) Proceedings, pp.73-79, 2011.
DOI : 10.1109/CIVTS.2011.5949533
URL : https://hal.archives-ouvertes.fr/hal-00617644

B. Piot, M. Geist, and O. Pietquin, Boosted and reward-regularized classification for apprenticeship learning, Proceedings of AAMAS2014, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01107837

A. N. Rafferty, E. Brunskill, T. L. Griffiths, and P. Shafto, Faster Teaching by POMDP Planning, In Artificial intelligence in education, vol.32, issue.1, pp.280-287, 2011.
DOI : 10.1037/h0044672

E. Reiter and R. Dale, Building Natural Language Generation Systems, Studies in Natural Language Processing, 2000.
DOI : 10.1017/CBO9780511519857
URL : http://arxiv.org/abs/cmp-lg/9605002

N. Roy, J. Pineau, and S. Thrun, Spoken dialogue management using probabilistic reasoning, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics , ACL '00, pp.93-100, 2000.
DOI : 10.3115/1075218.1075231
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.8204

S. Russell, Learning agents for uncertain environments, Proceedings of COLT 1998, pp.101-103, 1998.
DOI : 10.1145/279943.279964
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.152.6795

J. Schatzmann, M. N. Stuttle, K. Weilhammer, and S. Young, Effects of the user model on simulationbased learning of dialogue strategies, Proceedings of ASRU 2005, 2005.

J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young, A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, The Knowledge Engineering Review, vol.21, issue.02, pp.97-126, 2006.
DOI : 10.1017/S0269888906000944

K. Scheffler and S. Young, Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning, Proceedings of the second international conference on Human Language Technology Research -, pp.12-19, 2002.
DOI : 10.3115/1289189.1289246

D. Schlangen and G. Skantze, A general, abstract model of incremental dialogue processing, Proceedings of EACL 2009, pp.710-718, 2009.

S. Singh, M. Kearns, D. Litman, and M. Walker, Reinforcement learning for spoken dialogue systems, Proceedings of NIPS99, 1999.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

A. Thomaz and C. Breazeal, Teachable robots: Understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, vol.172, issue.6-7, pp.716-737, 2008.
DOI : 10.1016/j.artint.2007.09.009
URL : http://doi.org/10.1016/j.artint.2007.09.009

B. Thomson, M. Gasic, M. Henderson, P. Tsiakoulis, and S. Young, N-best error simulation for training spoken dialogue systems, 2012 IEEE Spoken Language Technology Workshop (SLT), pp.37-42, 2012.
DOI : 10.1109/SLT.2012.6424194

S. Thrun, M. Beetz, M. Bennewitz, W. Burgard, A. B. Cremers et al., Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva, The International Journal of Robotics Research, vol.19, issue.11, pp.972-999, 2000.
DOI : 10.1177/02783640022067922

M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella, PARADISE : A framework for evaluating spoken dialogue agents, Proceedings of EACL 1997, pp.271-280, 1997.

J. D. Williams and S. Young, Partially observable Markov decision processes for spoken dialog systems, Computer Speech & Language, vol.21, issue.2, pp.393-422, 2007.
DOI : 10.1016/j.csl.2006.06.008
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.315.5781

S. Young, M. Gasic, S. Keizer, F. Mairesse, J. Schatzmann et al., The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management, Computer Speech & Language, vol.24, issue.2, pp.150-174, 2010.
DOI : 10.1016/j.csl.2009.04.001
URL : https://hal.archives-ouvertes.fr/hal-00598186