Skip to Main content Skip to Navigation
Conference papers

Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values

Abstract : Neural vocoders are systematically evaluated on homogeneous train and test databases. This kind of evaluation is efficient to compare neural vocoders in their "comfort zone", yet it hardly reveals their limits towards unseen data during training. To compare their extrapolation capabilities, we introduce a methodology that aims at quantifying the robustness of neural vocoders in synthesising unseen data, by precisely controlling the ranges of seen/unseen data in the training database. By focusing in this study on the pitch (F0) parameter, our methodology involves a careful splitting of a dataset to control which F0 values are seen/unseen during training, followed by both global (utterance) and local (frame) evaluation of vocoders. Comparison of four types of vocoders (autoregressive, sourcefilter, flows, GAN) displays a wide range of behaviour towards unseen input pitch values, including excellent extrapolation (WaveGlow); widely-spread F0 errors (WaveRNN); and systematic generation of the training set median F0 (LPCNet, Parallel WaveGAN). In contrast, fewer differences between vocoders were observed when using homogeneous train and test sets, thus demonstrating the potential and need for such evaluation to better discriminate the neural vocoders abilities to generate out-of-training-range data.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03338483
Contributor : Olivier Perrotin Connect in order to contact the contributor
Submitted on : Tuesday, September 14, 2021 - 9:09:13 AM
Last modification on : Tuesday, October 19, 2021 - 11:18:15 AM
Long-term archiving on: : Wednesday, December 15, 2021 - 6:02:14 PM

File

Perrotin_IS2021.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Olivier Perrotin, Hussein El Amouri, Gérard Bailly, Thomas Hueber. Evaluating the Extrapolation Capabilities of Neural Vocoders to Extreme Pitch Values. Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. pp.11-15, ⟨10.21437/Interspeech.2021-1547⟩. ⟨hal-03338483⟩

Share

Metrics

Record views

63

Files downloads

95