Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet

Axel Roebel; Frederik Bous

doi:10.3390/info13030103

Article Dans Une Revue Information Année : 2022

Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet

(1) , (1)

Axel Roebel

Fonction : Auteur
PersonId : 4527
IdHAL : axel-roebel
ORCID : 0000-0001-6136-4391
IdRef : 227186079

Analyse et synthèse sonores [Paris]

Frederik Bous

Fonction : Auteur

Analyse et synthèse sonores [Paris]

Résumé

The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and linked to the development of neural vocoders. These are deep neural networks that allow reconstructing high-quality speech from a given mel spectrogram. While initially developed for speech synthesis, now neural vocoders have also been studied in the context of voice attribute manipulation, opening new means for voice processing in audio production. However, to be able to apply neural vocoders in real-world applications, two problems need to be addressed: (1) To support use in professional audio workstations, the computational complexity should be small, (2) the vocoder needs to support a large variety of speakers, differences in voice qualities, and a wide range of intensities potentially encountered during audio production. In this context, the present study will provide a detailed description of the Multi-band Excited WaveNet, a fully convolutional neural vocoder built around signal processing blocks. It will evaluate the performance of the vocoder when trained on a variety of multi-speaker and multi-singer databases, including an experimental evaluation of the neural vocoder trained on speech and singing voices. Addressing the problem of intensity variation, the study will introduce a new adaptive signal normalization scheme that allows for robust compensation for dynamic and static gain variations. Evaluations are performed using objective measures and a number of perceptual tests including different neural vocoder algorithms known from the literature. The results confirm that the proposed vocoder compares favorably to the state-of-the-art in its capacity to generalize to unseen voices and voice qualities. The remaining challenges will be discussed.

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Axel Roebel : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03599076

Soumis le : dimanche 6 mars 2022-16:46:47

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-03599076 , version 1 (06-03-2022)

Identifiants

HAL Id : hal-03599076 , version 1
DOI : 10.3390/info13030103

Citer

Axel Roebel, Frederik Bous. Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet. Information, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩. ⟨hal-03599076⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS IRCAM STMS SORBONNE-UNIVERSITE SU-SCIENCES ANR

186 Consultations

0 Téléchargements

Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager