Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

Duc Canh Nguyen 1 Gérard Bailly 1 Frédéric Elisei 2
1 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
2 GIPSA-Services - GIPSA-Services
GIPSA-lab - Grenoble Images Parole Signal Automatique
Abstract : To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [19, 20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01848063
Contributor : Gérard Bailly <>
Submitted on : Tuesday, July 24, 2018 - 11:43:26 AM
Last modification on : Thursday, March 7, 2019 - 8:27:53 PM
Long-term archiving on : Thursday, October 25, 2018 - 12:49:13 PM

File

dcn_HCII2018.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01848063, version 1

Citation

Duc Canh Nguyen, Gérard Bailly, Frédéric Elisei. Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs. 20th International Conference on Human-Computer Interaction (HCI 2018), Jul 2018, Las Vegas, United States. pp.164-175. ⟨hal-01848063⟩

Share

Metrics

Record views

357

Files downloads

282