3D Visual Speech Animation from Image Sequences

Utpala Musti; Slim Ouni; Zhou Ziheng

Communication Dans Un Congrès Année : 2014

3D Visual Speech Animation from Image Sequences

(1) , (2) , (1)

1
2

Utpala Musti

Fonction : Auteur
PersonId : 961701

University of Oulu

Slim Ouni

Fonction : Auteur
PersonId : 1158
IdHAL : slim-ouni
ORCID : 0000-0001-5286-7368

Analysis, perception and recognition of speech

Zhou Ziheng

Fonction : Auteur

University of Oulu

Résumé

In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. This system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. The rst module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. The second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. This has the added advantages of a 3D visual speech, which are render ability of the face in dierent poses and illumination conditions and, enhanced visual information of tongue and teeth. The first module for 3D shape estimation is based on the detectionof facial landmarks in images. It uses a prior 3D Morphable Models (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specic domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specic data. The estimated 3D shape sequences are provided as input to the second module along with the phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation.

Mots clés

3D visual speech visual speech animation 3D facial shape estimation from images facial landmark detection

Domaines

Autre [q-bio.OT] Géométrie algorithmique [cs.CG] Interface homme-machine [cs.HC]

Slim Ouni : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01086073

Soumis le : vendredi 21 novembre 2014-17:44:21

Dernière modification le : vendredi 24 mars 2023-14:52:59

Dates et versions

hal-01086073 , version 1 (21-11-2014)

Identifiants

HAL Id : hal-01086073 , version 1

Citer

Utpala Musti, Slim Ouni, Zhou Ziheng. 3D Visual Speech Animation from Image Sequences. Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Dec 2014, Bangalore, India. ⟨hal-01086073⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

259 Consultations

0 Téléchargements

3D Visual Speech Animation from Image Sequences

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager