Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

Frederik Bous; Laurent Benaroya; Nicolas Obin; Axel Roebel

Communication Dans Un Congrès Année : 2022

Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

, , (1) ,

Frederik Bous

Fonction : Auteur

Laurent Benaroya

Fonction : Auteur

Nicolas Obin

Fonction : Auteur
PersonId : 7042
IdHAL : nicolas-obin
ORCID : 0000-0002-5236-5306
IdRef : 157523799

Analyse et synthèse sonores [Paris]

Axel Roebel

Fonction : Auteur

Résumé

This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original neural-VC architecture is proposed based on sequence-to-sequence voice conversion (S2S-VC) in which the speech prosody of the source speaker is preserved during conversion. First, the S2S-VC architecture is modified so as to synchronize the converted speech with the source speech by mean of phonetic duration encoding; second, the decoder is conditioned on the desired sequence of F0values and an explicit F0-loss is formulated between the F0 of the source speaker and the one of the converted speech. Besides, an adversarial learning of conversions is integrated within the S2S-VC architecture so as to exploit both advantages of reconstruction of original speech and converted speech with manipulated attributes during training and then reducing the inconsistency between training and conversion. An experimental evaluation on the VCTK speech database shows that the speech prosody can be efficiently preserved during conversion, and that the proposed adversarial learning consistently improves the conversion and the naturalness of the reenacted speech.

Mots clés

Voice conversion Voice reenactement Prosody preservation

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP]

Fichier principal

VC_EUSIPCO2022.pdf (265.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Obin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03677462

Soumis le : mardi 24 mai 2022-16:24:57

Dernière modification le : samedi 7 octobre 2023-21:36:22

Archivage à long terme le : mardi 30 août 2022-10:20:03

Dates et versions

hal-03677462 , version 1 (24-05-2022)

Identifiants

HAL Id : hal-03677462 , version 1

Citer

Frederik Bous, Laurent Benaroya, Nicolas Obin, Axel Roebel. Voice Reenactment with F0 and timing constraints and adversarial learning of conversions. 30th European Signal Processing Conference (EUSIPCO 2022), Aug 2022, Belgrade, Serbia. ⟨hal-03677462⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES

57 Consultations

69 Téléchargements

Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager