Skip to Main content Skip to Navigation
Conference papers

Impact of Segmentation and Annotation in French end-to-end Synthesis

Abstract : Audio books are commonly used to train text-to-speech models (TTS), as they offer large phonetic content with rather expressive pronunciation, but number and sizes of publicly available audio books corpora differ between languages. Moreover, the quality and accuracy of the available utterance segmentations are debatable. Yet, the impact of segmentation on the output synthesis is not well established. Additionally, utterances are generally used individually, without taking advantage of text level structuring information, even though they influence speaker reading. In this paper, we conduct a multidimensional evaluation of Tacotron2 trained on different segmentations and text level annotations of the same French corpus. We show that both spectrum accuracy and expressiveness depend on the segmentation used. In particular, a shorter segmentation, in addition with the annotation of paragraphs, benefits to spectrum reconstruction at the detriment of phrasing. Multidimensional analysis of mean opinion scores obtained with a MUSHRA-experiment revealed that phrasing was relatively more important than spectrum accuracy in perceptual judgement. This work serves as evidence that particular attention must be given to models evaluation, as well as how to use the training corpus to maximize synthesis characteristics of interest.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03362000
Contributor : Martin Lenglet Connect in order to contact the contributor
Submitted on : Friday, October 1, 2021 - 4:16:26 PM
Last modification on : Monday, October 25, 2021 - 9:34:10 AM
Long-term archiving on: : Sunday, January 2, 2022 - 7:36:33 PM

File

lenglet21_ssw.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Martin Lenglet, Olivier Perrotin, Gérard Bailly. Impact of Segmentation and Annotation in French end-to-end Synthesis. SSW 11th ISCA Speech Synthesis Workshop, Aug 2021, Budapest, Hungary. pp.13-18, ⟨10.21437/SSW.2021-3⟩. ⟨hal-03362000⟩

Share

Metrics

Record views

41

Files downloads

40