Using definite clause grammars to build a global system for analyzing collections of documents

Joseph Chazalon 1 Bertrand Coüasnon 1
1 IMADOC - Interprétation et Reconnaissance d’Images et de Documents
UR1 - Université de Rennes 1, INSA Rennes - Institut National des Sciences Appliquées - Rennes, CNRS - Centre National de la Recherche Scientifique : UMR6074
Abstract : Collections of documents are sets of heterogeneous documents, like a specific ancient book series, having proper structural and semantic properties linking them. A particular collection contains document images with specific physical layouts, like text pages or full-page illustrations, appearing in a specific order. Its contents, like journal articles, may be shared by several pages, not necessary following, producing strong dependencies between pages interpretations.In order to build an analysis system which can bring contextual information from the collection to the appropriate recognition modules for each page, we propose to express the structural and the semantic properties of a collection with a definite clause grammar. This is made possible by representing collections as streams of document descriptors, and by using extensions to the formalism we present here. We are then able to automatically generate a parser dedicated to a collection. Beside allowing structural variations and complex information flows, we also show that this approach enables the design of analysis stages, on a document or a set of documents. The interest of context usage is illustrated with several examples and their appropriate formalization in this framework.
Type de document :
Communication dans un congrès
Document Recognition and Retrieval XVII, Jan 2010, United States. 7534 (1), pp.75340R, 2010, 〈10.1117/12.840436〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00644971
Contributeur : Joseph Chazalon <>
Soumis le : vendredi 25 novembre 2011 - 16:14:18
Dernière modification le : vendredi 16 novembre 2018 - 01:21:59
Document(s) archivé(s) le : vendredi 16 novembre 2012 - 12:05:29

Fichier

article.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Joseph Chazalon, Bertrand Coüasnon. Using definite clause grammars to build a global system for analyzing collections of documents. Document Recognition and Retrieval XVII, Jan 2010, United States. 7534 (1), pp.75340R, 2010, 〈10.1117/12.840436〉. 〈hal-00644971〉

Partager

Métriques

Consultations de la notice

131

Téléchargements de fichiers

114