Discourse phrases classification: direct vs. narrative audio speech

Abstract : In the field of storytelling, speech synthesis is trying to move from a neutral machine-like to an expressive voice. For para-metric and unit-selection systems, building new features or cost functions is necessary to allow a better expressivity control. The present article investigates the classification task between direct and narrative discourse phrases to build a new expressivity score. Different models are trained on different speech units (syllable, word and discourse phrases) from an audiobook with 3 sets of features. Classification experiments are conducted on the Blizzard corpus which features children English audio-books and contains various characters and emotional states. The experiments show that the fusion of SVM classifiers trained with different prosodic and phonologic feature sets increases the classification rate from 67.4% with 14 prosodic features to 71.8% with the 3 merged sets. Also the addition of a decision threshold achieves promising results for expressive speech synthesis according to the strength of the constraint required on expressivity: 71.8% with 100% of the words, 79.9% with 50% and 82.6% with 25%.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01790910
Contributor : Marie Tahon <>
Submitted on : Monday, May 14, 2018 - 10:37:19 AM
Last modification on : Thursday, February 7, 2019 - 4:49:47 PM
Long-term archiving on : Tuesday, September 25, 2018 - 9:29:03 AM

File

SP18_paper_95.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01790910, version 1

Citation

Marie Tahon, Damien Lolive. Discourse phrases classification: direct vs. narrative audio speech. Speech Prosody, Jun 2018, Poznan, Poland. ⟨hal-01790910⟩

Share

Metrics

Record views

636

Files downloads

185