C. L. Sidner, C. Lee, C. D. Kidd, N. Lesh, and C. Rich, Explorations in Engagement for Humans and Robots, Artificial Intelligence, pp.140-164, 2005.

S. Dermouche and C. Pelachaud, Engagement modeling in dyadic interaction, International Conference on Multimodal Interaction, pp.440-445, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02382442

V. Barrière, C. Clavel, and S. Essid, Attitude Classification in Adjacency Pairs of a Human-Agent Interaction with Hidden Conditional Random Fields, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.4949-4953, 2018.

C. Langlet and C. Clavel, Improving Social Relationships in Face-to-Face Human-Agent Interactions: When the Agent Wants to Know User's Likes and Dislikes, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp.1064-1073, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01259923

A. Ben-youssef, G. Varni, S. Essid, and C. Clavel, On-the-Fly Detection of User Engagement Decrease in Spontaneous Human-Robot Interaction Using Recurrent and Deep Neural Networks, International Journal of Social Robotics, 2019.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1724-1734, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

T. Liu and A. Kappas, Predicting engagement breakdown in HRI using thin-slices of facial expressions, Workshops of the AAAI Conference on Artificial Intelligence, pp.37-43, 2018.

K. Inoue, D. Lala, K. Takanashi, and T. Kawahara, Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models, Annual Conference of the International Speech Communication Association (Interspeech), pp.616-620, 2018.

W. Min, K. Park, J. Wiggins, B. Mott, E. Wiebe et al., Predicting Dialogue Breakdown in Conversational Pedagogical Agents with Multimodal LSTMs, Artificial Intelligence in Education, pp.195-200, 2019.

A. Ben-youssef, C. Clavel, and S. Essid, Early Detection of User Engagement Breakdown in Spontaneous Human-Humanoid Interaction, IEEE Transactions on Affective Computing, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02288043

S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Comput, vol.9, issue.8, pp.1735-1780, 1997.

W. Yun, D. Lee, C. Park, J. Kim, and J. Kim, Automatic Recognition of Children Engagement from Facial Video using Convolutional Neural Networks, IEEE Transactions on Affective Computing, 2018.

N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. F. Gelbukh et al., DialogueRNN: An Attentive RNN for Emotion Detection in Conversations, Proceedings of the AAAI Conference on Artificial Intelligence, pp.6818-6825, 2019.

D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.2594-2604, 2018.

D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L. Morency et al., Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp.2122-2132, 2018.

D. Ghosal, N. Majumder, S. Poria, N. Chhaya, and A. Gelbukh, DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.

D. Zhang, L. Wu, C. Sun, S. Li, Q. Zhu et al., Modeling both Context-and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations, Proceedings of the International Joint Conference on Artificial Intelligence, pp.5415-5421, 2019.

T. N. Kipf and M. Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR, 2017.

N. Rollet and C. Clavel, Talk to you later": Doing Social Robotics with Conversation Analysis. Towards the Development of an Automatic System for the Prediction of Disengagement, Interaction Studies, pp.269-293, 2020.

A. Ben-youssef, C. Clavel, S. Essid, M. Bilac, M. Chamoux et al., UE-HRI: A New Dataset for the Study of User Engagement in Spontaneous Human-robot Interactions, Proceedings of the ACM International Conference on Multimodal Interaction, pp.464-472, 2017.

P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. Sloetjes, ELAN: a Professional Framework for Multimodality Research, LREC, pp.1556-1559, 2006.

T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L. Morency, Open-Face 2.0: Facial Behavior Analysis Toolkit, IEEE International Conference on Automatic Face & Gesture Recognition, pp.59-66, 2018.

F. Eyben, M. Wöllmer, and B. Schuller, openSMILE: The Munich Versatile and Fast Open-Source Audio Feature Extractor, Proceedings of the international conference on Multimedia, pp.1459-1462, 2010.

C. Joder, S. Essid, and G. Richard, Temporal Integration for Audio Classification with Application to Musical Instrument Classification, IEEE Transactions on Audio, Speech and Language Processing, vol.17, issue.1, pp.174-186, 2009.

D. P. Kingma and J. Ba, ADAM: A Method for Stochastic Optimization, ICLR, 2015.