Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis

Maël Pouget 1 Olha Nahorna 1 Thomas Hueber 1 Gérard Bailly 1
1 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : Incremental text-to-speech systems aim at synthesizing a text 'on-the-fly', while the user is typing a sentence. In this context, this article addresses the problem of the part-of-speech tagging (POS, i.e. lexical category) which is a critical step for accurate grapheme-to-phoneme conversion and prosody estimation. Here, the main challenge is to estimate the POS of a given word without knowing its 'right context' (i.e. the following words which are not available yet). To address this issue, we propose a method based on a set of decision trees estimating online whether a given POS tag is likely to be modified when more right-contextual information becomes available. In such a case, the synthesis is delayed until POS stability is guaranteed. This results in delivering the synthetic voice in word chunks of variable length. Objective evaluation on French shows that the proposed method is able to estimate POS tags with more than a 92% accuracy (compared to a non-incremental system) while minimizing the synthesis latency (between 1 and 4 words). Perceptual evaluation (ranking test) is then carried in the context of HMM-based speech synthesis. Experimental results show that the word grouping resulting from the proposed method is rated more acceptable than word-byword incremental synthesis.
Type de document :
Communication dans un congrès
17th Annual Conference of the International Speech Communication Association (Interspeech 2016), Sep 2016, San Francisco, CA, United States. pp.2846 - 2850, 2016, Proceedings of Interspeech 2016, San Francisco, CA, USA. 〈10.21437/Interspeech.2016-165〉
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01374782
Contributeur : Maël Pouget <>
Soumis le : samedi 1 octobre 2016 - 12:31:33
Dernière modification le : mercredi 24 mai 2017 - 15:38:55
Document(s) archivé(s) le : lundi 2 janvier 2017 - 12:52:06

Fichier

interspeech-2016-itts (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Maël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly. Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis. 17th Annual Conference of the International Speech Communication Association (Interspeech 2016), Sep 2016, San Francisco, CA, United States. pp.2846 - 2850, 2016, Proceedings of Interspeech 2016, San Francisco, CA, USA. 〈10.21437/Interspeech.2016-165〉. 〈hal-01374782〉

Partager

Métriques

Consultations de
la notice

277

Téléchargements du document

60