Acoustic-to-articulatory inversion in speech based on statistical models

Atef Ben Youssef; Pierre Badin; Gérard Bailly

Communication Dans Un Congrès Année : 2010

Acoustic-to-articulatory inversion in speech based on statistical models

(1) , (1) , (1)

Atef Ben Youssef

Fonction : Auteur

GIPSA - Machines parlantes, Gestes oro-faciaux, Interaction Face-à-face, Communication augmentée

Pierre Badin

Fonction : Auteur
PersonId : 4918
IdHAL : pierrebadin
ORCID : 0000-0001-7440-820X
IdRef : 117976687

GIPSA - Machines parlantes, Gestes oro-faciaux, Interaction Face-à-face, Communication augmentée

Gérard Bailly

Fonction : Auteur
PersonId : 444
IdHAL : gerard-bailly
ORCID : 0000-0002-6053-0818
IdRef : 033792135

GIPSA - Machines parlantes, Gestes oro-faciaux, Interaction Face-à-face, Communication augmentée

Résumé

Two speech inversion methods are implemented and compared. In the first, multistream Hidden Markov Models (HMMs) of phonemes are jointly trained from synchronous streams of articulatory data acquired by EMA and speech spectral parameters; an acoustic recognition system uses the acoustic part of the HMMs to deliver a phoneme chain and the states durations; this information is then used by a trajectory formation procedure based on the articulatory part of the HMMs to resynthesise the articulatory movements. In the second, Gaussian Mixture Models (GMMs) are trained on these streams to directly associate articulatory frames with acoustic frames in context, using Maximum Likelihood Estimation. Over a corpus of 17 minutes uttered by a French speaker, the RMS error was 1.62 mm with the HMMs and 2.25 mm with the GMMs.

Mots clés

Speech inversion ElectroMagnetic Articulography (EMA) Hidden Markov Model (HMM) Gaussian Mixture Model (GMM) Maximum Likelihood Estimation (MLE)

Domaines

Sciences de l'information et de la communication

Pierre Badin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00508279

Soumis le : lundi 2 août 2010-17:31:25

Dernière modification le : jeudi 4 avril 2024-18:19:38

Dates et versions

hal-00508279 , version 1 (02-08-2010)

Identifiants

HAL Id : hal-00508279 , version 1

Citer

Atef Ben Youssef, Pierre Badin, Gérard Bailly. Acoustic-to-articulatory inversion in speech based on statistical models. AVSP 2010 - 9th International Conference on Auditory-Visual Speech Processing, Sep 2010, Hakone, Kanagawa, Japan. pp.S8-3. ⟨hal-00508279⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS GIPSA GIPSA-DPC GIPSA-MAGIC

170 Consultations

0 Téléchargements

Acoustic-to-articulatory inversion in speech based on statistical models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager