Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis

Anca Simon 1 Pascale Sébillot 1 Guillaume Gravier 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : Topic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmen-tation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in a hierarchical manner. This is achieved by leveraging the temporal distribution of word re-occurrences, searching for bursts, to skirt the limits imposed by a global counting of lexical re-occurrences within segments. Comparison to a reference dense segmentation on varied datasets indicates that we can achieve a better topic focus while retrieving all of the important aspects of a text.
Type de document :
Communication dans un congrès
Recent Advances on Natural Language Processing, 2015, Hissar, Bulgaria. 2015
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01186443
Contributeur : Guillaume Gravier <>
Soumis le : lundi 24 août 2015 - 22:30:40
Dernière modification le : vendredi 16 novembre 2018 - 01:40:48
Document(s) archivé(s) le : mercredi 25 novembre 2015 - 19:13:22

Fichier

114_Paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01186443, version 1

Citation

Anca Simon, Pascale Sébillot, Guillaume Gravier. Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis. Recent Advances on Natural Language Processing, 2015, Hissar, Bulgaria. 2015. 〈hal-01186443〉

Partager

Métriques

Consultations de la notice

1613

Téléchargements de fichiers

198