Improving TTS with corpus-specific pronunciation adaptation

Marie Tahon; Raheel Qader; Gwénolé Lecorvé; Damien Lolive

Communication Dans Un Congrès Année : 2016

Improving TTS with corpus-specific pronunciation adaptation

(1) , (1) , (1) , (1)

Marie Tahon

Fonction : Auteur
PersonId : 9821
IdHAL : marie-tahon
ORCID : 0000-0002-6782-0332
IdRef : 165065532

Expressiveness in Human Centered Data/Media

Raheel Qader

Fonction : Auteur
PersonId : 778121
IdRef : 224293559

Expressiveness in Human Centered Data/Media

Gwénolé Lecorvé

Fonction : Auteur
PersonId : 20677
IdHAL : gwenole-lecorve
ORCID : 0000-0002-4271-2087
IdRef : 150245254

Expressiveness in Human Centered Data/Media

Damien Lolive

Fonction : Auteur
PersonId : 5088
IdHAL : damien-lolive
ORCID : 0000-0002-1110-5444
IdRef : 13017498X

Expressiveness in Human Centered Data/Media

Résumé

Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem , the present work aims at adapting automatically generated pronunciations to the corpus. The main idea is to train corpus-specific phoneme-to-phoneme conditional random fields with a large set of linguistic, phonological, articulatory and acoustic-prosodic features. Features are first selected in cross-validation condition, then combined to produce the final best feature set. Pronunciation models are evaluated in terms of phoneme error rate and through perceptual tests. Experiments carried out on a French speech corpus show an improvement in the quality of speech synthesis when pronunciation models are included in the phonetization process. Appart from improving TTS quality, the presented pronunciation adaptation method also brings interesting perspectives in terms of expressive speech synthesis.

Mots clés

speech synthesis conditional random fields pronunciation adaptation feature selection

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

interspeech_2016_prononciation_final (1).pdf (284.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gwénolé Lecorvé : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01338111

Soumis le : vendredi 23 septembre 2016-16:10:41

Dernière modification le : mardi 3 octobre 2023-09:49:51

Dates et versions

hal-01338111 , version 1 (23-09-2016)

Identifiants

HAL Id : hal-01338111 , version 1

Citer

Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. Improving TTS with corpus-specific pronunciation adaptation. Interspeech, Sep 2016, San Francisco, United States. ⟨hal-01338111⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA CENTRALESUPELEC IRISA-D6 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

461 Consultations

333 Téléchargements

Improving TTS with corpus-specific pronunciation adaptation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager