Document Flow Segmentation for Business Applications - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès IS&T/SPIE Electronic Imaging Année : 2014

Document Flow Segmentation for Business Applications

Résumé

The aim of this paper is to propose a document flow supervised segmentation approach applied to real world heterogeneous documents. Our algorithm treats the flow of documents as couples of consecutive pages and studies the relationship that exists between them. At first, sets of features are extracted from the pages where we propose an approach to model the couple of pages into a single feature vector representation. This representation will be provided to a binary classifier which classifies the relationship as either segmentation or continuity. In case of segmentation, we consider that we have a complete document and the analysis of the flow continues by starting a new document. In case of continuity, the couple of pages are assimilated to the same document and the analysis continues on the flow. If there is an uncertainty on whether the relationship between the couple of pages should be classified as a continuity or segmentation, a rejection is decided and the pages analyzed until this point are considered as a "fragment". The first classification already provides good results approaching 90% on certain documents, which is high at this level of the system.
Fichier principal
Vignette du fichier
Version_5.pdf (287.46 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00926615 , version 1 (10-01-2014)

Identifiants

  • HAL Id : hal-00926615 , version 1

Citer

Hani Daher, Belaïd Abdel. Document Flow Segmentation for Business Applications. Document Recognition and Retrieval XXI, Feb 2014, San Francisco, France. pp.9201-15. ⟨hal-00926615⟩
267 Consultations
643 Téléchargements

Partager

Gmail Facebook X LinkedIn More