A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice

Frederik Bous; Axel Roebel

doi:10.3390/info13030102

Article Dans Une Revue Information Année : 2022

A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice

(1) , (1)

Frederik Bous

Fonction : Auteur
PersonId : 1063810
ORCID : 0000-0002-7477-7600
IdRef : 273954911

Analyse et synthèse sonores [Paris]

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Analyse et synthèse sonores [Paris]

Résumé

In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code.

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Axel Roebel : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03599085

Soumis le : dimanche 6 mars 2022-16:57:38

Dernière modification le : mardi 19 décembre 2023-12:25:32

Dates et versions

hal-03599085 , version 1 (06-03-2022)

Identifiants

HAL Id : hal-03599085 , version 1
DOI : 10.3390/info13030102

Citer

Frederik Bous, Axel Roebel. A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice. Information, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩. ⟨hal-03599085⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES ANR

139 Consultations

0 Téléchargements

A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager