Skip to Main content Skip to Navigation
Journal articles

A Texture-based Method for Document Segmentation and Classification

Abstract : In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from MediaTeam Document Database
Document type :
Journal articles
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Coordination Episciences Iam Connect in order to contact the contributor
Submitted on : Tuesday, January 26, 2016 - 4:05:16 PM
Last modification on : Wednesday, October 30, 2019 - 4:34:07 PM
Long-term archiving on: : Wednesday, April 27, 2016 - 1:21:31 PM


Publisher files allowed on an open archive




Ming-Wei Lin, Jules-Raymond Tapamo, Baird Ndovie. A Texture-based Method for Document Segmentation and Classification. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, INRIA, 2007, Volume 6, april 2007, joint Special Issue ARIMA/SACJ on Advances in end-user data mining techniques, pp.49-56. ⟨10.46298/arima.1878⟩. ⟨hal-01262352⟩



Record views


Files downloads