3D Visual Speech Animation from Image Sequences

Utpala Musti 1 Slim Ouni 2 Zhou Ziheng 1
2 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : In this paper we describe an early version of our system which synthesizes 3D visual speech including tongue and teeth from frontal facial image sequences. This system is developed for 3D Visual Speech Animation (VSA) using images generated by an existing state-of-the-art image-based VSA system. In fact, the prime motivation for this system is to have a 3D VSA system from limited amount of training data when compared to that required for developing a conventional corpus based 3D VSA system. It consists of two modules. The rst module iteratively estimates the 3D shape of the external facial surface for each image in the input sequence. The second module complements the external face with 3D tongue and teeth to complete the perceptually crucial visual speech information. This has the added advantages of a 3D visual speech, which are render ability of the face in dierent poses and illumination conditions and, enhanced visual information of tongue and teeth. The first module for 3D shape estimation is based on the detectionof facial landmarks in images. It uses a prior 3D Morphable Models (3D-MM) trained using 3D facial data. For the time being it is developed for a person-specic domain, i.e., the 3D-MM and the 2D facial landmark detector are trained using the data of a single person and tested with the same person-specic data. The estimated 3D shape sequences are provided as input to the second module along with the phonetic segmentation. For any particular 3D shape, tongue and teeth information is generated by rotating the lower jaw based on few skin points on the jaw and animating a rigid 3D tongue through keyframe interpolation.
Complete list of metadatas

Contributor : Slim Ouni <>
Submitted on : Friday, November 21, 2014 - 5:44:21 PM
Last modification on : Thursday, January 11, 2018 - 6:19:57 AM


  • HAL Id : hal-01086073, version 1



Utpala Musti, Slim Ouni, Zhou Ziheng. 3D Visual Speech Animation from Image Sequences. Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP), Dec 2014, Bangalore, India. ⟨hal-01086073⟩



Record views