Clustering Document Images using a Bag of Symbols Representation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2005

Clustering Document Images using a Bag of Symbols Representation

Eugen Barbu
  • Fonction : Auteur
  • PersonId : 836634
Pierre Héroux
Eric Trupin
  • Fonction : Auteur
  • PersonId : 836636

Résumé

Document image classification is an important step in document image analysis. Based on classification results we can tackle other tasks such as indexation, understanding or navigation in document collections. Using a document representation and an unsupervised classification method, we may group documents that from the user point of view constitute valid clusters. The semantic gap between a domain independent document representation and the user implicit representation can lead to unsatisfactory results. In this paper we describe document images based on frequent occurring symbols. This document description is created in an unsupervised manner and can be related to the domain knowledge. Using data mining techniques applied to a graph based document representation we find frequent and maximal subgraphs. For each document image, we construct a bag containing the frequent subgraphs found in it. This bag of "symbols" represents the description of a document. We present results obtained on a corpus of 60 graphical document images.
Fichier principal
Vignette du fichier
icdar05.pdf (154.84 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00601832 , version 1 (20-06-2011)

Identifiants

Citer

Eugen Barbu, Pierre Héroux, Sébastien Adam, Eric Trupin. Clustering Document Images using a Bag of Symbols Representation. International Conference on Document Analysis and Recognition, 2005, Seoul, South Korea. pp.1216-1220, ⟨10.1109/ICDAR.2005.75⟩. ⟨hal-00601832⟩
67 Consultations
477 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More