Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Kai Yu; Heiga Zen; François Mairesse; Steve Young

doi:10.1016/j.specom.2011.03.003

Article Dans Une Revue Speech Communication Année : 2011

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

, (1) , ,

Kai Yu

Fonction : Auteur correspondant
PersonId : 931985

Connectez-vous pour contacter l'auteur

Heiga Zen

Fonction : Auteur

Toshiba Research Europe Ltd

François Mairesse

Fonction : Auteur
PersonId : 11608
IdHAL : francois-mairesse
IdRef : 069872201

Steve Young

Fonction : Auteur

Résumé

To achieve natural high quality synthesized speech in HMM-based speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combinatorial of contexts. However, weak contexts, such as word-level emphasis in natural speech, are difficult to capture using this approach. Also, due to combinatorial explosion, incorporating new contexts within the traditional framework may easily lead to the problem of insufficient data coverage. To effectively model weak contexts and reduce the data sparsity problem, different types of contexts should be treated independently. provides a structured framework for this whereby standard HMMs represent normal contexts and transforms represent the additional effects of weak contexts. In contrast to speaker adaptive training in speech recognition, separate decision trees have to be built for different types of context factors. This paper describes the general framework of context adaptive training and investigates three concrete forms: MLLR, CMLLR and CAT based systems. Experiments on a word-level emphasis synthesis task show that all context adaptive training approaches can outperform the standard full-context-dependent HMM approach. However, the MLLR based system achieved the best performance.

Mots clés

HMM-based speech synthesis context adaptive training factorized decision tree state clustering

Domaines

Linguistique

Fichier principal

PEER_stage2_10.1016%2Fj.specom.2011.03.003.pdf (368.16 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00746106

Soumis le : samedi 27 octobre 2012-03:54:55

Dernière modification le : lundi 2 septembre 2019-18:00:02

Archivage à long terme le : samedi 17 décembre 2016-05:31:30

Dates et versions

hal-00746106 , version 1 (27-10-2012)

Identifiants

HAL Id : hal-00746106 , version 1
DOI : 10.1016/j.specom.2011.03.003

Citer

Kai Yu, Heiga Zen, François Mairesse, Steve Young. Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis. Speech Communication, 2011, ⟨10.1016/j.specom.2011.03.003⟩. ⟨hal-00746106⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

77 Consultations

148 Téléchargements

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager