Skip to Main content Skip to Navigation
New interface
Conference papers

Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation

Mireille Fares Catherine Pelachaud Nicolas Obin 1 
1 Analyse et synthèse sonores [Paris]
STMS - Sciences et Technologies de la Musique et du Son
Abstract : We propose a semantically-aware speech driven model to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we aim to produce natural and continuous head motion and upper-facial gestures synchronized with speech. We propose a model that generates these gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach and compare it with state of the art.
Complete list of metadata
Contributor : Nicolas Obin Connect in order to contact the contributor
Submitted on : Tuesday, May 24, 2022 - 4:23:11 PM
Last modification on : Friday, November 18, 2022 - 12:29:55 AM
Long-term archiving on: : Tuesday, August 30, 2022 - 10:02:57 AM


Files produced by the author(s)


  • HAL Id : hal-03677459, version 1


Mireille Fares, Catherine Pelachaud, Nicolas Obin. Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation. EUSIPCO, Aug 2022, Belgrade, Serbia. ⟨hal-03677459⟩



Record views


Files downloads