Skip to Main content Skip to Navigation
Conference papers

Discourse phrases classification: direct vs. narrative audio speech

Abstract : In the field of storytelling, speech synthesis is trying to move from a neutral machine-like to an expressive voice. For para-metric and unit-selection systems, building new features or cost functions is necessary to allow a better expressivity control. The present article investigates the classification task between direct and narrative discourse phrases to build a new expressivity score. Different models are trained on different speech units (syllable, word and discourse phrases) from an audiobook with 3 sets of features. Classification experiments are conducted on the Blizzard corpus which features children English audio-books and contains various characters and emotional states. The experiments show that the fusion of SVM classifiers trained with different prosodic and phonologic feature sets increases the classification rate from 67.4% with 14 prosodic features to 71.8% with the 3 merged sets. Also the addition of a decision threshold achieves promising results for expressive speech synthesis according to the strength of the constraint required on expressivity: 71.8% with 100% of the words, 79.9% with 50% and 82.6% with 25%.
Complete list of metadata

Cited literature [23 references]  Display  Hide  Download
Contributor : Marie Tahon Connect in order to contact the contributor
Submitted on : Monday, May 14, 2018 - 10:37:19 AM
Last modification on : Wednesday, November 3, 2021 - 6:05:45 AM
Long-term archiving on: : Tuesday, September 25, 2018 - 9:29:03 AM


Files produced by the author(s)


  • HAL Id : hal-01790910, version 1


Marie Tahon, Damien Lolive. Discourse phrases classification: direct vs. narrative audio speech. Speech Prosody, Jun 2018, Poznan, Poland. ⟨hal-01790910⟩



Record views


Files downloads