Skip to Main content Skip to Navigation
Conference papers

Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

Duc Canh Nguyen 1 Gérard Bailly 1 Frédéric Elisei 2
2 GIPSA-Services - GIPSA-Services
GIPSA-lab - Grenoble Images Parole Signal Automatique
Abstract : To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [19, 20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Gérard Bailly Connect in order to contact the contributor
Submitted on : Tuesday, July 24, 2018 - 11:43:26 AM
Last modification on : Wednesday, November 3, 2021 - 5:06:59 AM
Long-term archiving on: : Thursday, October 25, 2018 - 12:49:13 PM


Files produced by the author(s)


  • HAL Id : hal-01848063, version 1



Duc Canh Nguyen, Gérard Bailly, Frédéric Elisei. Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs. HCI 2018 - 20th International Conference on Human-Computer Interaction, Jul 2018, Las Vegas, United States. pp.164-175. ⟨hal-01848063⟩



Les métriques sont temporairement indisponibles