Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping

Laurent Girin 1, 2 Thomas Hueber 2 Xavier Alameda-Pineda 1, 3
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : This article addresses the adaptation of an acoustic-articulatory inversion model of a reference speaker to the voice of another source speaker, using a limited amount of audio-only data. In this study, the articulatory-acoustic relationship of the reference speaker is modeled by a Gaussian mixture model and inference of articulatory data from acoustic data is made by the associated Gaussian mixture regression (GMR). To address speaker adaptation, we previously proposed a general framework called Cascaded-GMR (C-GMR) which decomposes the adaptation process into two consecutive steps: spectral conversion between source and reference speaker and acoustic-articulatory inversion of converted spectral trajectories. In particular, we proposed the Integrated C-GMR technique (IC-GMR) in which both steps are tied together in the same probabilistic model. In this article, we extend the C-GMR framework with another model called Joint-GMR (J-GMR). Contrary to the IC-GMR, this model aims at exploiting all potential acoustic-articulatory relationships, including those between the source speaker's acoustics and the reference speaker's articulation. We present the full derivation of the exact Expectation-Maximization (EM) training algorithm for the J-GMR. It exploits the missing data methodology of machine learning to deal with limited adaptation data. We provide an extensive evaluation of the J-GMR on both synthetic acoustic-articulatory data and on the multi-speaker MOCHA EMA database. We compare the J-GMR performance to other models of the C-GMR framework, notably the IC-GMR, and discuss their respective merits.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (3), pp.662-673. 〈10.1109/TASLP.2017.2651398〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01485540
Contributeur : Thomas Hueber <>
Soumis le : jeudi 9 mars 2017 - 02:55:29
Dernière modification le : jeudi 15 juin 2017 - 09:09:42
Document(s) archivé(s) le : samedi 10 juin 2017 - 13:20:12

Fichier

girin_etal_jgmr_taslp_2017_pre...
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Laurent Girin, Thomas Hueber, Xavier Alameda-Pineda. Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2017, 25 (3), pp.662-673. 〈10.1109/TASLP.2017.2651398〉. 〈hal-01485540〉

Partager

Métriques

Consultations de
la notice

621

Téléchargements du document

79