Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis

Anca Simon 1 Pascale Sébillot 1 Guillaume Gravier 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : Topic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmen-tation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in a hierarchical manner. This is achieved by leveraging the temporal distribution of word re-occurrences, searching for bursts, to skirt the limits imposed by a global counting of lexical re-occurrences within segments. Comparison to a reference dense segmentation on varied datasets indicates that we can achieve a better topic focus while retrieving all of the important aspects of a text.
Liste complète des métadonnées

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01186443
Contributor : Guillaume Gravier <>
Submitted on : Monday, August 24, 2015 - 10:30:40 PM
Last modification on : Friday, November 16, 2018 - 1:40:48 AM
Document(s) archivé(s) le : Wednesday, November 25, 2015 - 7:13:22 PM

File

114_Paper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01186443, version 1

Citation

Anca Simon, Pascale Sébillot, Guillaume Gravier. Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis. Recent Advances on Natural Language Processing, 2015, Hissar, Bulgaria. ⟨hal-01186443⟩

Share

Metrics

Record views

1834

Files downloads

206