Indexation of Document Images using Frequent Items

Abstract : Documents exist in different formats. When we have document images, in order to access some part, preferably all, of the information contained in that images, we have to deploy a document image analysis application. Document images can be mostly textual or mostly graphical. If, for a user, a task is to retrieve document images, relevant to a query from a set, we must use indexing techniques. The documents and the query are translated in a common representation. Using a dissimilarity measure (between the query and the document representations) and a method to speed-up the search process we may find documents that are from the user point of view relevant to his query. The semantic gap between a document representation and the user implicit representation can lead to unsatisfactory results. If we want to access objects from document images that are relevant to the document semantic we must enter in a document understanding cycle. Understanding document images is made in systems that are (usually) domain dependent, and that are not applicable in general cases (textual and graphical document classes). In this paper we present a method to describe and then to index document images using frequently occurences of items. The intuition is that frequent tems represents symbols in a certain domain and this document description can be related to the domain knowledge (in an unsupervised manner). The novelty of our method consists in using graph summaries as a description for document images. In our approach we use a bag (multiset) of graphs as description for document images. From the document images we extract a graph based representation. In these graphs, we apply graph mining techniques in order to find frequent and maximally subgraphs. For each document image we construct a bag with all frequent subgraphs found in the graph-based representation. This bag of "symbols" represents the description of the document.
Type de document :
Communication dans un congrès
Hugo Gamboa and Ana L. N. Fred. International Workshop on Pattern Recognition in Information Systems, 2005, Miami, United States. pp.164-173, 2005
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00601530
Contributeur : Pierre Héroux <>
Soumis le : samedi 18 juin 2011 - 10:07:43
Dernière modification le : mercredi 11 octobre 2017 - 11:18:03
Document(s) archivé(s) le : vendredi 9 novembre 2012 - 15:25:20

Fichier

pris05.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00601530, version 1

Collections

Citation

Eugen Barbu, Pierre Héroux, Sébastien Adam, Eric Trupin. Indexation of Document Images using Frequent Items. Hugo Gamboa and Ana L. N. Fred. International Workshop on Pattern Recognition in Information Systems, 2005, Miami, United States. pp.164-173, 2005. 〈hal-00601530〉

Partager

Métriques

Consultations de
la notice

72

Téléchargements du document

45