Audio-Visual Speaker Conversion using Prosody Features

Adela Barbulescu 1, 2, * Thomas Hueber 3 Gérard Bailly 3 Rémi Ronfard 1, *
* Corresponding author
1 IMAGINE - Intuitive Modeling and Animation for Interactive Graphics & Narrative Environments
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
3 GIPSA-MAGIC - MAGIC
GIPSA-DPC - Département Parole et Cognition
Abstract : The article presents a joint audio-video approach towards speaker identity conversion, based on statistical methods originally introduced for voice conversion. Using the experimental data from the 3D BIWI Audiovisual corpus of Affective Communication, mapping functions are built between each two speakers in order to convert speaker-specific features: speech signal and 3D facial expressions. The results obtained by combining audio and visual features are compared to corresponding results from earlier approaches, while outlining the improvements brought by introducing dynamic features and exploiting prosodic features.
Document type :
Conference papers
AVSP - 12th International Conference on Auditory-Visual Speech Processing (AVSP 2013), Aug 2013, Annecy, France. pp.11-16, 2013



https://hal.inria.fr/hal-00842928
Contributor : Remi Ronfard <>
Submitted on : Tuesday, July 9, 2013 - 5:08:12 PM
Last modification on : Wednesday, June 17, 2015 - 1:15:53 AM

Files

avsp2013.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00842928, version 1

Citation

Adela Barbulescu, Thomas Hueber, Gérard Bailly, Rémi Ronfard. Audio-Visual Speaker Conversion using Prosody Features. AVSP - 12th International Conference on Auditory-Visual Speech Processing (AVSP 2013), Aug 2013, Annecy, France. pp.11-16, 2013. <hal-00842928>

Export

Share

Metrics

Record views

482

Document downloads

553