Comparing NLP solutions for the disambiguation of French heterophonic homographs for end-to-end TTS systems
Résumé
This paper presents a study on different NLP solutions for French homographs disambiguation for text-to-speech systems. Solutions are compared using a home-made corpus of 8137 sentences extracted from the Web, comprising roughly one hundred instances of each of 34 pairs of prototypical words. A disambiguation system based on per-case Linear Discriminant Analysis (LDA) classifiers using contextual word embeddings as input features achieves state-of-the-art F-scores superior to 0.96.
Origine : Fichiers produits par l'(les) auteur(s)