Clustering of Document Images using Graph Summaries

Abstract : Document image classification is an important step in document image analysis. Based on classification results we can tackle other tasks such as indexation, understanding or navigation in document collections. Using a document representation and an unsupervized classification method, we can group documents that from the user point of view constitute valid clusters. The semantic gap between a domain independent document representation and the user implicit representation can lead to unsatisfactory results. In this paper we describe document images based on frequent occurring symbols. This document description is created in an unsupervised manner and can be related to the domain knowledge. Using data mining techniques applied to a graph based document representation we found frequent and maximal subgraphs. For each document image, we construct a bag containing the frequent subgraphs found in it. This bag of "symbols" represents the description of a document. We present results obtained on a corpus of graphical document images.
Type de document :
Communication dans un congrès
Machine Learning and Data Mining, 2005, Germany. 3587, pp.194-202, 2005, 〈10.1007/11510888_20〉
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00601501
Contributeur : Pierre Héroux <>
Soumis le : vendredi 17 juin 2011 - 21:37:10
Dernière modification le : mercredi 11 octobre 2017 - 11:18:03
Document(s) archivé(s) le : dimanche 18 septembre 2011 - 02:30:44

Fichier

mldm05.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Eugen Barbu, Pierre Héroux, Sébastien Adam, Eric Trupin. Clustering of Document Images using Graph Summaries. Machine Learning and Data Mining, 2005, Germany. 3587, pp.194-202, 2005, 〈10.1007/11510888_20〉. 〈hal-00601501〉

Partager

Métriques

Consultations de
la notice

98

Téléchargements du document

74