P. Auer, P. Nicoì-o-cesa-bianchi, and . Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, 2002.

M. Barlier, J. Perolat, R. Laroche, and O. Pietquin, Human-Machine Dialogue as a Stochastic Game, Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2015.
DOI : 10.18653/v1/W15-4602

URL : https://hal.archives-ouvertes.fr/hal-01225848

R. Bellman, DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS, Proceedings of the National Academy of Sciences, vol.42, issue.10, p.767, 1956.
DOI : 10.1073/pnas.42.10.767

URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC528332/pdf

N. Casanueva, T. Hain, H. Christensen, R. Marxer, and P. Green, Knowledge transfer between speakers for personalised dialogue management, Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.12-21, 2015.
DOI : 10.18653/v1/W15-4603

S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin, Clustering behaviors of Spoken Dialogue Systems users, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
DOI : 10.1109/ICASSP.2012.6289038

URL : https://hal.archives-ouvertes.fr/hal-00685009

S. Chandramohan, M. Geist, and O. Pietquin, Optimizing Spoken Dialogue Management with Fitted Value Iteration, Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00553184

D. Ernst, P. Geurts, and L. Wehenkel, Tree-Based Batch Mode Reinforcement Learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005.

M. Ga?i´ga?i´c, C. Breslin, M. Henderson, D. Kim, M. Szummer et al., Pomdpbased dialogue manager adaptation to extended domains, Proceedings of the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue (Sigdial), 2013.

A. Genevay and R. Laroche, Transfer learning for user adaptation in spoken dialogue systems, Proceedings of the 15th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, 2016.

K. Georgila, R. David, and . Traum, Reinforcement learning of argumentation dialogue policies in negotiation, Proceedings of the 11th Annual Conference of the International Speech Communication Association (Interspeech), pp.2073-2076, 2011.

S. Janarthanam and O. Lemon, Adaptive Referring Expression Generation in Spoken Dialogue Systems: Evaluation with Real Users, pp.124-131, 2010.

R. Laroche and A. Genevay, The Negotiation Dialogue Game, Dialogues with Social Robots, pp.403-410, 2017.
DOI : 10.1007/978-3-642-27645-3_5

A. Lazaric, Transfer in Reinforcement Learning: A Framework and a Survey, Reinforcement Learning, pp.143-173, 2012.
DOI : 10.1007/978-3-642-27645-3_5

URL : https://hal.archives-ouvertes.fr/hal-00772626

A. Lazaric, M. Restelli, and A. Bonarini, Transfer of samples in batch reinforcement learning, Proceedings of the 25th international conference on Machine learning, ICML '08, 2008.
DOI : 10.1145/1390156.1390225

E. Levin and R. Pieraccini, A stochastic model of computer-human interaction for learning dialogue strategies, Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), 1997.

L. Li, J. D. Williams, and S. Balakrishnan, Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection, Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), pp.2475-2478, 2009.

M. M. Hassan-mahmud, M. Hawasly, B. Rosman, and S. Ramamoorthy, Clustering markov decision processes for continual transfer, 2013.

A. Massoud, F. Mohammad-ghavamzadeh, C. Szepesvári, and S. Mannor, Regularized Fitted Q-iteration for Planning in Continuous- Space Markovian Decision Problems, 2009.

O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-buet, Sampleefficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing (TSLP), vol.7, issue.3, p.7, 2011.
DOI : 10.1145/1966407.1966412

URL : https://hal.archives-ouvertes.fr/hal-00617517

F. Sadri, F. Toni, and P. Torroni, Dialogues for Negotiation: Agent Varieties and Dialogue Sequences, ATAL, pp.405-421, 2001.
DOI : 10.1007/3-540-45448-9_30

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

S. Richard, . Sutton, G. Andrew, and . Barto, Reinforcement learning: An introduction, 1998.

E. Matthew, P. Taylor, and . Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

A. Nikolaevich and T. , Regularization of incorrectly posed problems, Soviet Mathematics Doklady, 1963.

S. Ultes, M. Kraus, A. Schmitt, and W. Minker, Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling, Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp.374-383, 2015.
DOI : 10.18653/v1/W15-4649