Annotation automatique des types de discours dans des livres audio en vue d'une oralisation par un système de synthèse

Abstract : To synthesize audiobooks in an expressive manner, it is necessary to know the type of discourses that have to be produced. However, in a novel or a tale, narrative perspectives and discourse types often change, moving from narrative and recitative paragraphs to direct speech, reported speech, and even dialogs. In this work, we will present a tool that was developed from the analysis of a corpus (including excerpts from Madame Bovary and Les Mystères de Paris) and that relies on paragraph as basic unit. It allows not only to automatically determine the type of speech (narrative speech, direct speech, dialogs), and therefore to know who is speaking, but also to annotate the extension of the discursive modifications. This later point is important, especially in the case of parentheticals with reporting verbs where the narrator speaks again in the middle of a direct speech sequence. In its current form, the tool achieves a 89 % detection rate.
Complete list of metadatas

Cited literature [11 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01848856
Contributor : Aghilas Sini <>
Submitted on : Wednesday, July 25, 2018 - 11:38:03 AM
Last modification on : Friday, September 13, 2019 - 9:48:03 AM
Long-term archiving on : Friday, October 26, 2018 - 1:48:38 PM

File

taln2018.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01848856, version 1

Citation

Aghilas Sini, Elisabeth Delais-Roussarie, Damien Lolive. Annotation automatique des types de discours dans des livres audio en vue d'une oralisation par un système de synthèse. TALN-RECITAL 2018 , May 2018, Rennes, France. ⟨hal-01848856⟩

Share

Metrics

Record views

132

Files downloads

61