Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Multimodal generation of upper-facial and head gestures with a Transformer Network using speech and text

Abstract : We propose a semantically-aware speech driven method to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we tackle two key challenges: produce natural and continuous head motion and upper-facial gestures. We propose a model that generates gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03570955
Contributor : Mireille FARES Connect in order to contact the contributor
Submitted on : Sunday, February 13, 2022 - 7:04:26 PM
Last modification on : Monday, June 27, 2022 - 12:30:20 PM

Links full text

Identifiers

  • HAL Id : hal-03570955, version 1
  • ARXIV : 2110.04527

Citation

Mireille Fares, Catherine Pelachaud, Nicolas Obin. Multimodal generation of upper-facial and head gestures with a Transformer Network using speech and text. 2021. ⟨hal-03570955⟩

Share

Metrics

Record views

37