HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis

Résumé

This paper introduces HBA 1.0, a representative pixel-based annotated dataset which is released at the IC-DAR2017 Competition on Historical Book Analysis (HBA2017). The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13 th and 19 th centuries. The HBA 1.0 dataset contains 2,435 and 2,001 manuscript and printed pages, respectively. The ground truth of the HBA 1.0 dataset contains more than 7,58 billion annotated pixels. The HBA 1.0 dataset addresses a thriving topic of major interest of many researchers in different fields including (historical) document image analysis, image processing, pattern recognition and classification. The HBA 1.0 dataset and its ground truth can be used to evaluate the capabilities of image analysis methods to discriminate the textual content from the graphical ones on the one hand, and to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic) on the other hand. Evaluation results of a state-of-the-art pixel-labeling method on the HBA 1.0 dataset are reported and discussed in this paper in order to provide a benchmark/baseline for future evaluation studies and to showcase the intended use of the HBA 1.0 dataset.
Fichier principal
Vignette du fichier
MarouaMEHRI_HIP2017_Article-h.pdf (131.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01637826 , version 1 (18-11-2017)

Identifiants

Citer

Maroua Mehri, Pierre Héroux, Rémy Mullot, Jean-Philippe Moreux, Bertrand B. Coüasnon, et al.. HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis. International Workshop on Historical Document Imaging and Processing (HIP), Nov 2017, Kyoto, Japan. ⟨10.1145/3151509.3151528⟩. ⟨hal-01637826⟩
595 Consultations
399 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More