Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

Roberto Barra-Chicote; Junichi Yamagishi; Simon King; Juan Manuel Montero; Javier Macias-Guarasa

doi:10.1016/j.specom.2009.12.007

Article Dans Une Revue Speech Communication Année : 2010

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

, , , ,

Roberto Barra-Chicote

Fonction : Auteur correspondant
PersonId : 911265

Connectez-vous pour contacter l'auteur

Junichi Yamagishi

Fonction : Auteur

Simon King

Fonction : Auteur

Juan Manuel Montero

Fonction : Auteur

Javier Macias-Guarasa

Fonction : Auteur

Résumé

We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded -, , ,, , . For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion.Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.

Mots clés

Emotional speech synthesis HMM-based synthesis unit selection

Fichier principal

PEER_stage2_10.1016%2Fj.specom.2009.12.007.pdf (617.43 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00627926

Soumis le : vendredi 30 septembre 2011-02:53:06

Dernière modification le : vendredi 30 septembre 2011-02:53:06

Archivage à long terme le : dimanche 4 décembre 2016-22:29:02

Dates et versions

hal-00627926 , version 1 (30-09-2011)

Identifiants

HAL Id : hal-00627926 , version 1
DOI : 10.1016/j.specom.2009.12.007

Citer

Roberto Barra-Chicote, Junichi Yamagishi, Simon King, Juan Manuel Montero, Javier Macias-Guarasa. Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech. Speech Communication, 2010, 52 (5), pp.394. ⟨10.1016/j.specom.2009.12.007⟩. ⟨hal-00627926⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

116 Consultations

356 Téléchargements

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

Résumé

Mots clés

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager