A texture-based pixel labeling approach for historical books

Abstract : Over the last few years, there has been tremendous growth in the automatic processing of digitized historical documents. In fact, finding reliable systems for the interpretation of ancient documents has been a topic of major interest for many libraries and the prime issue of research in the document analysis community. One important challenge is to refine well-known approaches based on strong a priori knowledge (e.g. the document image content, layout, typography, font size and type, scanning resolution, image size, etc.). Nevertheless, a texture analysis approach has consistently been chosen to segment a page layout when information is lacking on document structure and content. Thus, in this article a framework is proposed to investigate the use of texture as a tool for automatically determining homogeneous regions in a digitized historical book and segmenting its contents by extracting and analyzing texture features independently of the layout of the pages. The proposed framework is parameter-free and applicable to a large variety of ancient of books. It does not assume a priori information regarding document image content and structure. It consists of two phases: a texture-based feature extraction step and unsupervised clustering and labeling task based on the consensus clustering, hierarchical ascendant classification, and nearest neighbor search algorithms. The novelty of this work lies in the clustering of extracted texture descriptors to find automatically homogeneous regions, i.e. graphic and textual regions, by using the clustering approach on an entire book instead of processing each page individually. Our framework has been evaluated on a large variety of historical books and achieved promising results.
Type de document :
Article dans une revue
Pattern Analysis and Applications, Springer Verlag, 2015, pp.1-40. 〈10.1007/s10044-015-0451-9〉
Liste complète des métadonnées

Littérature citée [163 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01237249
Contributeur : Maroua Mehri <>
Soumis le : mercredi 2 décembre 2015 - 23:52:35
Dernière modification le : mercredi 11 octobre 2017 - 11:18:01
Document(s) archivé(s) le : jeudi 3 mars 2016 - 15:11:06

Fichier

MarouaMEHRI_PAA.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Alain Boucher, Rémy Mullot. A texture-based pixel labeling approach for historical books. Pattern Analysis and Applications, Springer Verlag, 2015, pp.1-40. 〈10.1007/s10044-015-0451-9〉. 〈hal-01237249〉

Partager

Métriques

Consultations de
la notice

153

Téléchargements du document

251