Text segmentation using a cache memory

Abstract : This paper describes the application of an information-theoretic approach to document segmentation. Several segmentation criteria are proposed using topic shift detection or just blindly comparing the contents of cache memories where keywords are temporarily stored as a document is analyzed. Experiments with a large corpus of articles from the French newspaper Le Monde show tangible advantages when different models are combined with a suitable strategy. Experimental results show that different strategies for topic shift detection have to be used depending on whether high recall or high precision are sought. Furthermore, methods based on topic independent distributions provide complementary candidates with respect to the use of topic-dependent distributions leading to an increase in recall with a minor loss in precision.
Keywords : Topic segmentation
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01392346
Contributor : Brigitte Bigi <>
Submitted on : Friday, November 4, 2016 - 11:55:05 AM
Last modification on : Saturday, March 23, 2019 - 1:22:10 AM

Identifiers

  • HAL Id : hal-01392346, version 1

Collections

Citation

Brigitte Bigi, Renato de Mori. Text segmentation using a cache memory. Control and Intelligent Systems, ACTA Press, 2002, 30 (3), pp.93-100. ⟨hal-01392346⟩

Share

Metrics

Record views

77