Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques

Stéphane Huet 1 Guillaume Gravier 1 Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We present a method for story segmentation of radio broadcast news, based on lexical, syntactic and audio cues. Starting from an existing statistical topic segmentation model which exploits the notion of lexical cohesion, we extend the formalism to include syntactic and acoustic knwoledge sources. Experimental results show that the sole use of lexical cohesion is not efficient for the type of documents under study because of the variable size of the segments and the lack of direct relation between topics and stories. The use of syntactics and acoustics enables a consequent improvement of the quality of the segmentation.
Complete list of metadatas

Cited literature [13 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02021382
Contributor : Stéphane Huet <>
Submitted on : Friday, February 15, 2019 - 6:32:49 PM
Last modification on : Wednesday, February 20, 2019 - 1:22:55 AM
Long-term archiving on : Friday, May 17, 2019 - 10:04:22 AM

File

TALN08.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02021382, version 1

Citation

Stéphane Huet, Guillaume Gravier, Pascale Sébillot. Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques. 15ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2008, Avignon, France. pp.49-58. ⟨hal-02021382⟩

Share

Metrics

Record views

24

Files downloads

24