HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis

Abstract : This paper introduces HBA 1.0, a representative pixel-based annotated dataset which is released at the IC-DAR2017 Competition on Historical Book Analysis (HBA2017). The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13 th and 19 th centuries. The HBA 1.0 dataset contains 2,435 and 2,001 manuscript and printed pages, respectively. The ground truth of the HBA 1.0 dataset contains more than 7,58 billion annotated pixels. The HBA 1.0 dataset addresses a thriving topic of major interest of many researchers in different fields including (historical) document image analysis, image processing, pattern recognition and classification. The HBA 1.0 dataset and its ground truth can be used to evaluate the capabilities of image analysis methods to discriminate the textual content from the graphical ones on the one hand, and to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic) on the other hand. Evaluation results of a state-of-the-art pixel-labeling method on the HBA 1.0 dataset are reported and discussed in this paper in order to provide a benchmark/baseline for future evaluation studies and to showcase the intended use of the HBA 1.0 dataset.
Type de document :
Communication dans un congrès
International Workshop on Historical Document Imaging and Processing (HIP), Nov 2017, Kyoto, Japan
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01637826
Contributeur : Maroua Mehri <>
Soumis le : samedi 18 novembre 2017 - 08:56:13
Dernière modification le : jeudi 14 décembre 2017 - 08:37:47

Fichier

MarouaMEHRI_HIP2017_Article-h....
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01637826, version 1

Citation

Maroua Mehri, Pierre Héroux, Rémy Mullot, Jean-Philippe Moreux, Bertrand Coüasnon, et al.. HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis. International Workshop on Historical Document Imaging and Processing (HIP), Nov 2017, Kyoto, Japan. 〈hal-01637826〉

Partager

Métriques

Consultations de la notice

118

Téléchargements de fichiers

16