A Pixel Labeling Approach for Historical Digitized Books

Abstract : In the context of historical collection conservation and worldwide diffusion, this paper presents an automatic approach of historical book page layout segmentation. In this article, we propose to search the homogeneous regions from the content of historical digitized books with little \textit{a priori} knowledge by extracting and analyzing texture features. The novelty of this work lies in the unsupervised clustering of the extracted texture descriptors to find homogeneous regions, i.e.\ graphic and textual regions, by performing the clustering approach on an entire book instead of processing each page individually. We propose firstly to characterize the content of an entire book by extracting the texture information of each page, as our goal is to compare and index the content of digitized books. The extraction of texture features, computed without any hypothesis on the document structure, is based on two non-parametric tools: the autocorrelation function and multiresolution analysis. Secondly, we perform an unsupervised clustering approach on the extracted features in order to classify automatically the homogeneous regions of book pages. The clustering results are assessed by internal and external accuracy measures. The overall results are quite satisfying. Such analysis would help to construct a computer-aided categorization tool of pages.
Type de document :
Communication dans un congrès
International Conference on Document Analysis and Recognition (ICDAR), Aug 2013, Washington, DC, United States. IEEE, pp.817-821, 2013, 〈10.1109/ICDAR.2013.167〉
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00927126
Contributeur : Maroua Mehri <>
Soumis le : mercredi 2 décembre 2015 - 20:23:05
Dernière modification le : samedi 16 décembre 2017 - 00:13:10
Document(s) archivé(s) le : jeudi 3 mars 2016 - 14:51:29

Fichier

MarouaMEHRI_ICDAR_2013.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Maroua Mehri, Pierre Héroux, Petra Gomez-Krämer, Alain Boucher, Rémy Mullot. A Pixel Labeling Approach for Historical Digitized Books. International Conference on Document Analysis and Recognition (ICDAR), Aug 2013, Washington, DC, United States. IEEE, pp.817-821, 2013, 〈10.1109/ICDAR.2013.167〉. 〈hal-00927126〉

Partager

Métriques

Consultations de la notice

173

Téléchargements de fichiers

40