From 3-D speaker cloning to text-to-audiovisual speech - Archive ouverte HAL Access content directly
Conference Papers Year : 2008

From 3-D speaker cloning to text-to-audiovisual speech

Abstract

Visible speech movements were motion captured and parameterized. Coarticulated targets were extracted from VCVs and modeled to generate arbitrary German utterances by target interpolation. The system was extended to synthesize English utterances by a mapping to German phonemes. An evaluation by means of a modified rhyme test reveals that the synthetic videos of isolated words increase the recognition scores from 27 % to 47.5 % when added to audio only presentation
Fichier principal
Vignette du fichier
sf_IS08.pdf (101.34 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-00361886 , version 1 (16-02-2009)

Identifiers

  • HAL Id : hal-00361886 , version 1

Cite

Sascha Fagel, Frédéric Elisei, Gérard Bailly. From 3-D speaker cloning to text-to-audiovisual speech. Interspeech 2008 - 9th Annual Conference of the International Speech Communication Association, Sep 2008, Brisbane, Australia. pp.2325. ⟨hal-00361886⟩
135 View
88 Download

Share

Gmail Facebook X LinkedIn More