Skip to Main content Skip to Navigation
New interface
Conference papers

Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

Frederik Bous Laurent Benaroya Nicolas Obin 1 Axel Roebel 
1 Analyse et synthèse sonores [Paris]
STMS - Sciences et Technologies de la Musique et du Son
Abstract : This paper introduces voice reenactement as the task of voice conversion (VC) in which the expressivity of the source speaker is preserved during conversion while the identity of a target speaker is transferred. To do so, an original neural-VC architecture is proposed based on sequence-to-sequence voice conversion (S2S-VC) in which the speech prosody of the source speaker is preserved during conversion. First, the S2S-VC architecture is modified so as to synchronize the converted speech with the source speech by mean of phonetic duration encoding; second, the decoder is conditioned on the desired sequence of F0values and an explicit F0-loss is formulated between the F0 of the source speaker and the one of the converted speech. Besides, an adversarial learning of conversions is integrated within the S2S-VC architecture so as to exploit both advantages of reconstruction of original speech and converted speech with manipulated attributes during training and then reducing the inconsistency between training and conversion. An experimental evaluation on the VCTK speech database shows that the speech prosody can be efficiently preserved during conversion, and that the proposed adversarial learning consistently improves the conversion and the naturalness of the reenacted speech.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03677462
Contributor : Nicolas Obin Connect in order to contact the contributor
Submitted on : Tuesday, May 24, 2022 - 4:24:57 PM
Last modification on : Thursday, June 2, 2022 - 3:38:11 AM
Long-term archiving on: : Tuesday, August 30, 2022 - 10:20:03 AM

File

VC_EUSIPCO2022.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03677462, version 1

Citation

Frederik Bous, Laurent Benaroya, Nicolas Obin, Axel Roebel. Voice Reenactment with F0 and timing constraints and adversarial learning of conversions. 30th European Signal Processing Conference (EUSIPCO 2022), Aug 2022, Belgrade, Serbia. ⟨hal-03677462⟩

Share

Metrics

Record views

35

Files downloads

43