Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality
Résumé
This study investigates the impact of phonetization and phonetic segmentation of training corpora on the quality of HMM-based TTS synthesis. HMM-TTS requires phonetic symbols aligned to the speech corpus in order to train the models used for synthesis. Phonetic annotation is a complex task, since pronunciation usually differs from spelling, as well as differing among regional accents. In this paper, the infrastructure of a French TTS system is presented. A corpus whose phonetic label occurrences were systematically modified (number of schwas and liaisons) and label boundaries were displaced, was used to train several systems, one for each condition. A perceptual evaluation of the influence of labeling accuracy on synthetic speech quality was conducted. Despite the degree of annotation changes, the synthetic speech quality of the five best systems remained close to that of the reference system, built upon the corpus whose labels were manually corrected.