Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation

Mireille Fares; Catherine Pelachaud; Nicolas Obin

Communication Dans Un Congrès Année : 2022

Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation

(1, 2) , (3, 2, 4) , (5)

1
2
3
4
5

Mireille Fares

Fonction : Auteur

Institut des Systèmes Intelligents et de Robotique

Perception, Interaction, Robotique sociales

Catherine Pelachaud

Fonction : Auteur
PersonId : 179912
IdHAL : catherine-pelachaud
ORCID : 0000-0003-1008-0799
IdRef : 110203283

Institut des Systèmes Intelligents et de Robotique

Perception, Interaction, Robotique sociales

Centre National de la Recherche Scientifique

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Analyse et synthèse sonores [Paris]

Résumé

We propose a semantically-aware speech driven model to generate expressive and natural upper-facial and head motion for Embodied Conversational Agents (ECA). In this work, we aim to produce natural and continuous head motion and upper-facial gestures synchronized with speech. We propose a model that generates these gestures based on multimodal input features: the first modality is text, and the second one is speech prosody. Our model makes use of Transformers and Convolutions to map the multimodal features that correspond to an utterance to continuous eyebrows and head gestures. We conduct subjective and objective evaluations to validate our approach and compare it with state of the art.

Mots clés

Semantically and Speech Driven Gestures Transformers Visual Prosody Embodied Conversational Agents

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP]

Fichier principal

UpperFace_EUSIPCO_2022.pdf (771.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03677459

Soumis le : mardi 24 mai 2022-16:23:11

Dernière modification le : jeudi 28 mars 2024-13:58:03

Archivage à long terme le : mardi 30 août 2022-10:02:57

Dates et versions

hal-03677459 , version 1 (24-05-2022)

Identifiants

HAL Id : hal-03677459 , version 1

Citer

Mireille Fares, Catherine Pelachaud, Nicolas Obin. Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation. EUSIPCO, Aug 2022, Belgrade, Serbia. ⟨hal-03677459⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM ISIR STMS SORBONNE-UNIVERSITE SU-SCIENCES ISIR_PIROS

98 Consultations

93 Téléchargements

Transformer Network for Semantically-Aware and Speech-Driven Upper-Face Generation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager