Skip to Main content Skip to Navigation
Journal articles

Text segmentation using a cache memory

Abstract : This paper describes the application of an information-theoretic approach to document segmentation. Several segmentation criteria are proposed using topic shift detection or just blindly comparing the contents of cache memories where keywords are temporarily stored as a document is analyzed. Experiments with a large corpus of articles from the French newspaper Le Monde show tangible advantages when different models are combined with a suitable strategy. Experimental results show that different strategies for topic shift detection have to be used depending on whether high recall or high precision are sought. Furthermore, methods based on topic independent distributions provide complementary candidates with respect to the use of topic-dependent distributions leading to an increase in recall with a minor loss in precision.
Keywords : Topic segmentation
Complete list of metadata
Contributor : Brigitte Bigi Connect in order to contact the contributor
Submitted on : Friday, November 4, 2016 - 11:55:05 AM
Last modification on : Tuesday, January 14, 2020 - 10:38:06 AM


  • HAL Id : hal-01392346, version 1



Brigitte Bigi, Renato de Mori. Text segmentation using a cache memory. Control and Intelligent Systems, ACTA Press, 2002, 30 (3), pp.93-100. ⟨hal-01392346⟩



Record views