Skip to Main content Skip to Navigation
Conference papers

Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques

Stéphane Huet 1 Guillaume Gravier 1 Pascale Sébillot 1 
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : We present a method for story segmentation of radio broadcast news, based on lexical, syntactic and audio cues. Starting from an existing statistical topic segmentation model which exploits the notion of lexical cohesion, we extend the formalism to include syntactic and acoustic knwoledge sources. Experimental results show that the sole use of lexical cohesion is not efficient for the type of documents under study because of the variable size of the segments and the lack of direct relation between topics and stories. The use of syntactics and acoustics enables a consequent improvement of the quality of the segmentation.
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Stéphane Huet Connect in order to contact the contributor
Submitted on : Friday, February 15, 2019 - 6:32:49 PM
Last modification on : Thursday, January 20, 2022 - 4:18:43 PM
Long-term archiving on: : Friday, May 17, 2019 - 10:04:22 AM


Files produced by the author(s)


  • HAL Id : hal-02021382, version 1


Stéphane Huet, Guillaume Gravier, Pascale Sébillot. Un modèle multi-sources pour la segmentation en sujets de journaux radiophoniques. 15ème conférence sur le Traitement Automatique des Langues Naturelles (TALN), 2008, Avignon, France. pp.49-58. ⟨hal-02021382⟩



Record views


Files downloads