Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context

Abstract : In unit selection text-to-speech synthesis, voice creation involved a phonemic transcription of read speech. This is produced by an automatic grapheme-to-phoneme conversion of the text read, followed by a manual correction. Although grapheme-to-phoneme conversion makes few errors, the manual correction is time consuming as every generated phoneme should be checked. We propose a method to automatically detect grapheme-to-phoneme conversion errors by comparing contrastives phonemisation hypothesis. A lattice-based forced alignment system is implemented, allowing for signal-dependent phonemisation. We implement also a sequence-to-sequence neural network model to obtain a context-dependent grapheme-to-phoneme conversion. On a French dataset, we show that we can detect to 86.3% of the errors made by a commercial grapheme-to-phoneme system. Moreover, the amount of data annotated as erroneous is kept under 10% of the total evaluation data. The time spent for phoneme manual checking can thus been drastically reduced without decreasing significantly the phonemic transcription quality.
Type de document :
Communication dans un congrès
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, Okinawa, Japan. 2017, 〈https://asru2017.org〉. 〈10.1109/ASRU.2017.8269004〉
Liste complète des métadonnées

Littérature citée [20 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01585770
Contributeur : Yannick Estève <>
Soumis le : mardi 12 septembre 2017 - 01:19:48
Dernière modification le : lundi 9 avril 2018 - 16:55:19
Document(s) archivé(s) le : mercredi 13 décembre 2017 - 16:20:41

Fichier

kv_asru17.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Kévin Vythelingum, Yannick Estève, Olivier Rosec. Error detection of grapheme-to-phoneme conversion in text-to-speech synthesis using speech signal and lexical context. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2017, Okinawa, Japan. 2017, 〈https://asru2017.org〉. 〈10.1109/ASRU.2017.8269004〉. 〈hal-01585770〉

Partager

Métriques

Consultations de la notice

539

Téléchargements de fichiers

273