Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping

This article addresses the adaptation of an acoustic-articulatory inversion model of a reference speaker to the voice of another source speaker, using a limited amount of audio-only data. In this study, the articulatory-acoustic relationship of the reference speaker is modeled by a Gaussian mixture model and inference of articulatory data from acoustic data is made by the associated Gaussian mixture regression (GMR). To address speaker adaptation, we previously proposed a general framework called Cascaded-GMR (C-GMR) which decomposes the adaptation process into two consecutive steps: spectral conversion between source and reference speaker and acoustic-articulatory inversion of converted spectral trajectories. In particular, we proposed the Integrated C-GMR technique (IC-GMR) in which both steps are tied together in the same probabilistic model. In this article, we extend the C-GMR framework with another model called Joint-GMR (J-GMR). Contrary to the IC-GMR, this model aims at exploiting all potential acoustic-articulatory relationships, including those between the source speaker's acoustics and the reference speaker's articulation. We present the full derivation of the exact Expectation-Maximization (EM) training algorithm for the J-GMR. It exploits the missing data methodology of machine learning to deal with limited adaptation data. We provide an extensive evaluation of the J-GMR on both synthetic acoustic-articulatory data and on the multi-speaker MOCHA EMA database. We compare the J-GMR performance to other models of the C-GMR framework, notably the IC-GMR, and discuss their respective merits.

Mots clés

Speaker adaptation Acoustic-articulatory inversion GMM Gaussian mixture regression EM Missing data

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP]

Fichier principal

girin_etal_jgmr_taslp_2017_preprint.pdf (2.18 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Hueber : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01485540

Soumis le : jeudi 9 mars 2017-02:55:29

Dernière modification le : jeudi 4 avril 2024-21:08:53

Archivage à long terme le : samedi 10 juin 2017-13:20:12

Dates et versions

hal-01485540 , version 1 (09-03-2017)

Identifiants

HAL Id : hal-01485540 , version 1
DOI : 10.1109/TASLP.2017.2651398

Citer

Laurent Girin, Thomas Hueber, Xavier Alameda-Pineda. Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017, 25 (3), pp.662-673. ⟨10.1109/TASLP.2017.2651398⟩. ⟨hal-01485540⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

574 Consultations

218 Téléchargements