Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis

Résumé

Incremental text-to-speech systems aim at synthesizing a text 'on-the-fly', while the user is typing a sentence. In this context, this article addresses the problem of the part-of-speech tagging (POS, i.e. lexical category) which is a critical step for accurate grapheme-to-phoneme conversion and prosody estimation. Here, the main challenge is to estimate the POS of a given word without knowing its 'right context' (i.e. the following words which are not available yet). To address this issue, we propose a method based on a set of decision trees estimating online whether a given POS tag is likely to be modified when more right-contextual information becomes available. In such a case, the synthesis is delayed until POS stability is guaranteed. This results in delivering the synthetic voice in word chunks of variable length. Objective evaluation on French shows that the proposed method is able to estimate POS tags with more than a 92% accuracy (compared to a non-incremental system) while minimizing the synthesis latency (between 1 and 4 words). Perceptual evaluation (ranking test) is then carried in the context of HMM-based speech synthesis. Experimental results show that the word grouping resulting from the proposed method is rated more acceptable than word-byword incremental synthesis.
Fichier principal
Vignette du fichier
interspeech-2016-itts (1).pdf (260.07 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01374782 , version 1 (01-10-2016)

Identifiants

Citer

Maël Pouget, Olha Nahorna, Thomas Hueber, Gérard Bailly. Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis. Interspeech 2016 - 17th Annual Conference of the International Speech Communication Association, Sep 2016, San Francisco, CA, United States. pp.2846 - 2850, ⟨10.21437/Interspeech.2016-165⟩. ⟨hal-01374782⟩
344 Consultations
167 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More