A. Kranstedt, P. Kühnlein, and I. Wachsmuth, Deixis in Multimodal Human Computer Interaction: An Interdisciplinary Approach, in Gesture-Based Communication in Human-Computer Interaction, A. Camurri and G. Volpe, pp.112-123, 2004.

R. A. Bolt, ???Put-that-there???, ACM SIGGRAPH Computer Graphics, vol.14, issue.3, pp.262-270, 1980.
DOI : 10.1145/965105.807503

A. Rochet-capellan, Does the number of syllables affect the finger pointing movement in a pointing-naming task ? in International Seminar on Speech Production, pp.257-260, 2008.

B. Roustan and M. Dohen, Gesture and speech coordination : The influence of the relationship between manual gesture and speech, Interspeech. 2010. Makuhari, Japan, pp.498-501
URL : https://hal.archives-ouvertes.fr/hal-00539293

D. C. Richardson, R. Dale, and K. Shockley, Synchrony and swing in conversation: coordination, temporal dynamics, and communication, pp.75-93, 2008.
DOI : 10.1093/acprof:oso/9780199231751.003.0004

D. H. Mcfarland, Respiratory Markers of Conversational Interaction, Journal of Speech Language and Hearing Research, vol.44, issue.1, pp.128-143, 2001.
DOI : 10.1044/1092-4388(2001/012)

G. Bailly, S. Raidt, and F. Elisei, Gaze, conversational agents and face-to-face communication. Speech Communication special issue on Speech and Face-to-Face Communication, pp.598-612, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00480335

K. Thórisson, Natural turn-taking needs no manual: computational theory and model from perception to action, in Multimodality in language and speech systems, pp.173-207, 2002.

H. Vilhjalmsson, The Behavior Markup Language: Recent Developments and Challenges, International Conference on Intelligent Virtual Agents, pp.99-111, 2007.
DOI : 10.1007/978-3-540-74997-4_10

D. Heylen, The next step towards a functional markup language. in Intelligent Virtual Agents (IVA), pp.37-44, 2008.

S. Scherer, Perception Markup Language: Towards a Standardized Representation of Perceived Nonverbal Behaviors, International Conference on Intelligent Virtual Agents (IVA). 2012
DOI : 10.1007/978-3-642-33197-8_47

H. Zen, K. Tokuda, and A. W. Black, Statistical parametric speech synthesis, Speech Communication, vol.51, issue.11, pp.1039-1064, 2009.
DOI : 10.1016/j.specom.2009.04.004

URL : https://hal.archives-ouvertes.fr/hal-00746106

S. Calinon, Learning and Reproduction of Gestures by Imitation, IEEE Robotics & Automation Magazine, vol.17, issue.2, pp.44-54, 2010.
DOI : 10.1109/MRA.2010.936947

K. Tokuda, Multi-space probability distribution HMM, IEICE Transaction of Information and System, issue.3, pp.455-464, 2002.

K. Otsuka, H. Sawada, and J. Yamato, Automatic Inference of Cross-modal Nonverbal Interactions in Multiparty Conversations from Gaze, Head Gestures, and Utterances "Who Responds to Whom, When, and How, International Conference on Multimodal Interfaces (ICMI), pp.255-262, 2007.

L. Morency, I. De-kok, and J. Gratch, A probabilistic multimodal approach for predicting listener backchannels, Autonomous Agents and Multi-Agent Systems, vol.23, issue.2, pp.70-84, 2010.
DOI : 10.1007/s10458-009-9092-y

Y. Mohammad, T. Nishida, and S. Okada, Unsupervised simultaneous learning of gestures, actions and their associations for Human-Robot Interaction, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.2537-2544, 2009.
DOI : 10.1109/IROS.2009.5353987

J. F. Ferreira, A Bayesian framework for active artificial perception, IEEE Transactions on Cybernetics, vol.43, issue.2, pp.699-711, 2013.
DOI : 10.1109/TSMCB.2012.2214477

URL : https://hal.archives-ouvertes.fr/hal-00747148

C. E. Ford, Contingency and Units in Interaction, Discourse Studies, pp.27-52, 2004.
DOI : 10.1177/1461445604039438

J. Lee, The Rickel gaze model: A window on the mind of a virtual human. in Intelligent Virtual Agents Conference, 2007.

A. Mihoub, G. Bailly, and C. Wolf, Modelling perceptionaction loops: comparing sequential models with frame-based classifiers. in Human-Agent Interaction (HAI), pp.309-314, 2014.

J. Bloit and X. Rodet, Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2121-2124, 2008.
DOI : 10.1109/ICASSP.2008.4518061

URL : https://hal.archives-ouvertes.fr/hal-01161222

A. Mihoub, G. Bailly, and C. Wolf, Learning multimodal behavioral models for face-to-face social interaction, Journal on Multimodal User Interfaces, vol.10, issue.8
DOI : 10.1007/s12193-015-0190-7

URL : https://hal.archives-ouvertes.fr/hal-01170991

Y. Bengio and P. Frasconi, Input-output HMMs for sequence processing, IEEE Transactions on Neural Networks, vol.7, issue.5, pp.1231-1249, 1996.
DOI : 10.1109/72.536317

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.133.6544

C. Huang and B. Mutlu, Learning-based modeling of multimodal behaviors for humanlike robots, Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, HRI '14, pp.57-64
DOI : 10.1145/2559636.2559668

A. Mihoub, Graphical models for social behavior modeling in face-to face interaction, Pattern Recognition Letters, vol.74
DOI : 10.1016/j.patrec.2016.02.005

URL : https://hal.archives-ouvertes.fr/hal-01279427