Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora

Elizabeth Godoy; Olivier Rosec; Thierry Chonavel

doi:10.1109/TASL.2011.2177820

Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2012

Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora

(1) , (1) , (2, 3)

1
2
3

Elizabeth Godoy

Fonction : Auteur

Orange Labs [Lannion]

Olivier Rosec

Fonction : Auteur

Orange Labs [Lannion]

Thierry Chonavel

Fonction : Auteur
PersonId : 18509
IdHAL : thierry-chonavel
ORCID : 0000-0003-3406-0426
IdRef : 030666902

Lab-STICC_TB_CID_TOMS

Département Signal et Communications

Résumé

In Voice Conversion (VC), the speech of a source speaker is modified to resemble that of a particular target speaker. Currently, standard VC approaches use Gaussian mixture model (GMM)-based transformations that do not generate high-quality converted speech due to “over-smoothing” resulting from weak links between individual source and target frame parameters. Dynamic Frequency Warping (DFW) offers an appealing alternative to GMM-based methods, as more spectral details are maintained in transformation; however, the speaker timbre is less successfully converted because spectral power is not adjusted explicitly. Previous work combines separate GMM- and DFW-transformed spectral envelopes for each frame. This paper proposes a more effective DFW-based approach that 1) does not rely on the baseline GMM methods, and 2) functions on the acoustic class level. To adjust spectral power, an amplitude scaling function is used that compares the average target and warped source log spectra for each acoustic class. The proposed DFW with Amplitude scaling (DFWA) outperforms standard GMM and hybrid GMM-DFW methods for VC in terms of both speech quality and timbre conversion, as is confirmed in extensive objective and subjective testing. Furthermore, by not requiring time-alignment of source and target speech, DFWA is able to perform equally well using parallel or nonparallel corpora, as is demonstrated explicitly.

Mots clés

Voice conversion Gaussian mixture model Dynamic frequency warping

Domaines

Traitement du signal et de l'image [eess.SP]

Ex-Bibliothèque Télécom Bretagne (devenu IMT Atlantique) : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00739603

Soumis le : lundi 8 octobre 2012-15:10:23

Dernière modification le : vendredi 24 mars 2023-14:52:56

Dates et versions

hal-00739603 , version 1 (08-10-2012)

Identifiants

HAL Id : hal-00739603 , version 1
DOI : 10.1109/TASL.2011.2177820

Citer

Elizabeth Godoy, Olivier Rosec, Thierry Chonavel. Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20 (4), pp.1313 - 1323. ⟨10.1109/TASL.2011.2177820⟩. ⟨hal-00739603⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST INSTITUT-TELECOM CNRS ENIB LAB-STICC_ENIB LAB-STICC LAB-STICC_TB IMT-ATLANTIQUE

257 Consultations

0 Téléchargements

Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager