Text segmentation using a cache memory - Archive ouverte HAL Access content directly
Journal Articles Control and Intelligent Systems Year : 2002

Text segmentation using a cache memory

Brigitte Bigi

Abstract

This paper describes the application of an information-theoretic approach to document segmentation. Several segmentation criteria are proposed using topic shift detection or just blindly comparing the contents of cache memories where keywords are temporarily stored as a document is analyzed. Experiments with a large corpus of articles from the French newspaper Le Monde show tangible advantages when different models are combined with a suitable strategy. Experimental results show that different strategies for topic shift detection have to be used depending on whether high recall or high precision are sought. Furthermore, methods based on topic independent distributions provide complementary candidates with respect to the use of topic-dependent distributions leading to an increase in recall with a minor loss in precision.
No file

Dates and versions

hal-01392346 , version 1 (04-11-2016)

Identifiers

  • HAL Id : hal-01392346 , version 1

Cite

Brigitte Bigi, Renato de Mori. Text segmentation using a cache memory. Control and Intelligent Systems, 2002, 30 (3), pp.93-100. ⟨hal-01392346⟩

Collections

UNIV-AVIGNON LIA
52 View
0 Download

Share

Gmail Facebook X LinkedIn More