Optimal feature set and minimal training size for pronunciation adaptation in TTS

Marie Tahon; Raheel Qader; Gwénolé Lecorvé; Damien Lolive

Communication Dans Un Congrès Année : 2016

Optimal feature set and minimal training size for pronunciation adaptation in TTS

(1) , (1) , (1) , (1)

Marie Tahon

Fonction : Auteur
PersonId : 9821
IdHAL : marie-tahon
ORCID : 0000-0002-6782-0332
IdRef : 165065532

Expressiveness in Human Centered Data/Media

Raheel Qader

Fonction : Auteur
PersonId : 958276

Expressiveness in Human Centered Data/Media

Gwénolé Lecorvé

Fonction : Auteur
PersonId : 20677
IdHAL : gwenole-lecorve
ORCID : 0000-0002-4271-2087
IdRef : 150245254

Expressiveness in Human Centered Data/Media

Damien Lolive

Fonction : Auteur
PersonId : 5088
IdHAL : damien-lolive
ORCID : 0000-0002-1110-5444
IdRef : 13017498X

Expressiveness in Human Centered Data/Media

Résumé

Text-to-Speech (TTS) systems rely on a grapheme-to-phoneme converter which is built to produce canonical, or statically stylized, pronunciations. Hence, the TTS quality drops when phoneme sequences generated by this converter are inconsistent with those labeled in the speech corpus on which the TTS system is built, or when a given expressivity is desired. To solve this problem, the present work aims at automatically adapting generated pronunciations to a given style by training a phoneme-to-phoneme conditional random field (CRF). Precisely, our work investigates (i) the choice of optimal features among acoustic, articulatory, phonological and linguistic ones, and (ii) the selection of a minimal data size to train the CRF. As a case study, adaptation to a TTS-dedicated speech corpus is performed. Cross-validation experiments show that small training corpora can be used without much degrading performance. Apart from improving TTS quality, these results bring interesting perspectives for more complex adaptation scenarios towards expressive speech synthesis.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Interface homme-machine [cs.HC]

Damien Lolive : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01338853

Soumis le : mercredi 29 juin 2016-11:52:35

Dernière modification le : mardi 3 octobre 2023-09:49:07

Dates et versions

hal-01338853 , version 1 (29-06-2016)

Identifiants

HAL Id : hal-01338853 , version 1

Citer

Marie Tahon, Raheel Qader, Gwénolé Lecorvé, Damien Lolive. Optimal feature set and minimal training size for pronunciation adaptation in TTS. International Conference on Statistical Language and Speech Processing (SLSP), Oct 2016, Pilsen, Czech Republic. ⟨hal-01338853⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA CENTRALESUPELEC IRISA-D6 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

193 Consultations

0 Téléchargements

Optimal feature set and minimal training size for pronunciation adaptation in TTS

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager