Multipage Administrative Document Stream Segmentation - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Multipage Administrative Document Stream Segmentation

Résumé

We propose in this paper a framework for the segmentation and classification of document streams. The framework is composed of two modules: segmentation and verification. The two modules use an incremental classifier which learns progressively along the stream. In the segmentation module a relationship between two consecutive pages is classified as either: continuity or rupture. Rupture is synonymous of a clear break, thus probably a complete document. If the classifier is uncertain on whether the relationship should be a continuity or a rupture, an over-segmentation is proposed and we consider that we have a fragment i.e. portion of a document. Both fragments and documents are sent to the verification module where additionally to the incremental classifier it includes a correction module. The classifier predicts the classes of fragments and documents. The predicted class represents a context which is used as a query to search for similar contexts in the correction module and correct the segmentation and verification results. Corrections are sent back to the segmentation and verification modules to learn the correct classes. Results on real world databases show the effectiveness and stability of our approach.
Fichier non déposé

Dates et versions

hal-01254785 , version 1 (12-01-2016)

Identifiants

Citer

Hani Daher, Mohamed-Rafik Bouguelia, Belaïd Abdel, Vincent Poulain d'Andecy. Multipage Administrative Document Stream Segmentation. ICPR 2014 - 22nd International Conference on Pattern Recognition, Aug 2014, Stokholm, Sweden. pp.966 - 971 ⟨10.1109/ICPR.2014.176⟩. ⟨hal-01254785⟩
93 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More