]. Bellman, A Markovian Decision Process, Indiana University Mathematics Journal, vol.6, issue.4, 1957.
DOI : 10.1512/iumj.1957.6.56038

. Chandramohan, Optimizing Spoken Dialogue Management with Fitted Value Iteration, Inter- speech'10, Makuhari (Japan), 2010.
URL : https://hal.archives-ouvertes.fr/hal-00553184

. Gasic, Gaussian processes for fast policy optimisation of pomdp-based dialogue managers, SIGDIAL'10, 2010.

. Geist, M. Pietquin, O. Geist, and . Pietquin, Kalman Temporal Differences, Journal of Artificial Intelligence Research (JAIR), vol.39, pp.489-532, 2010.
DOI : 10.1109/adprl.2009.4927543
URL : https://hal.archives-ouvertes.fr/hal-00351297

. Geist, M. Pietquin, O. Geist, and . Pietquin, Managing Uncertainty within Value Function Approximation in Reinforcement Learning, Conference Proceedings (JMLR W& CP)
URL : https://hal.archives-ouvertes.fr/hal-00554398

]. Gordon, Stable Function Approximation in Dynamic Programming, ICML'95, 1995.
DOI : 10.1016/B978-1-55860-377-6.50040-2

. Henderson, Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets, Computational Linguistics, vol.16, issue.4, 2008.
DOI : 10.1098/rsta.2000.0593

. Jurcicek, Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems, terspeech'10, Makuhari (Japan), 2010.

]. Kalman, A New Approach to Linear Filtering and Prediction Problems, Journal of Basic Engineering, vol.82, issue.1, pp.35-45, 1960.
DOI : 10.1115/1.3662552

R. Lagoudakis and . Parr, Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003.

T. Larsson, D. R. Larsson, and . Traumlemon, Information state and dialogue management in the TRINDI dialogue move engine toolkit Natural Language Engineering An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system, EACL'06, 2000.

P. Levin, R. Levin, and . Pieraccini, Using Markov decision process for learning dialogue strategies, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181), 1998.
DOI : 10.1109/ICASSP.1998.674402
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.6606

. Levin, A stochastic model of human-machine interaction for learning dialog strategies, IEEE Transactions on Speech and Audio Processing, vol.8, issue.1, pp.11-23, 2000.
DOI : 10.1109/89.817450

. Li, Reinforcement Learning for Dialog Management using Least-Squares Policy Iteration and Fast Feature Selection, InterSpeech'09, 2009.

D. Pietquin, T. Pietquin, and . Dutoit, A probabilistic framework for dialog simulation and optimal strategy learning, IEEE Transactions on Audio, Speech and Language Processing, vol.14, issue.2, pp.589-599, 2006.
DOI : 10.1109/TSA.2005.855836
URL : https://hal.archives-ouvertes.fr/hal-00207952

. Pietquin, Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing, vol.7, issue.3, 2011.
DOI : 10.1145/1966407.1966412
URL : https://hal.archives-ouvertes.fr/hal-00617517

. Schatzmann, A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies, ASRU'05, pp.97-126, 2005.
DOI : 10.1017/S0269888906000944

. Singh, Reinforcement learning for spoken dialogue systems, NIPS'99, 1999.

B. Sutton, S. Richard, A. G. Sutton, and . Bartowalker, Reinforcement Learning: An Introduction PARADISE: A framework for evaluating spoken dialogue agents, ACL'97, 1997.
DOI : 10.1007/978-1-4615-3618-5

Y. Williams, S. Williams, and . Young, Partially observable Markov decision processes for spoken dialog systems, Computer Speech & Language, vol.21, issue.2, pp.231-422, 2007.
DOI : 10.1016/j.csl.2006.06.008
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.315.5781