Leveraging lexical cohesion and disruption for topic segmentation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Leveraging lexical cohesion and disruption for topic segmentation

Résumé

Topic segmentation classically relies on one of two criteria, either finding areas with coherent vocabulary use or detecting discontinuities. In this paper, we propose a segmentation criterion combining both lexical cohesion and disruption, enabling a trade-off between the two. We provide the mathematical formulation of the criterion and an efficient graph based decoding algorithm for topic segmentation. Experimental results on standard textual data sets and on a more challenging corpus of automatically transcribed broadcast news shows demonstrate the benefit of such a combination. Gains were observed in all conditions, with segments of either regular or varying length and abrupt or smooth topic shifts. Long segments benefit more than short segments.However the algorithm has proven robust on automatic transcripts with short segments and limited vocabulary reoccurrences.
Fichier principal
Vignette du fichier
emnlp.pdf (277.97 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00867011 , version 1 (27-09-2013)

Identifiants

  • HAL Id : hal-00867011 , version 1

Citer

Anca-Roxana Simon, Guillaume Gravier, Pascale Sébillot. Leveraging lexical cohesion and disruption for topic segmentation. International Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, Oct 2013, Seattle, United States. pp.1314--1324. ⟨hal-00867011⟩
353 Consultations
306 Téléchargements

Partager

Gmail Facebook X LinkedIn More