HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis

Abstract : This paper introduces HBA 1.0, a representative pixel-based annotated dataset which is released at the IC-DAR2017 Competition on Historical Book Analysis (HBA2017). The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13 th and 19 th centuries. The HBA 1.0 dataset contains 2,435 and 2,001 manuscript and printed pages, respectively. The ground truth of the HBA 1.0 dataset contains more than 7,58 billion annotated pixels. The HBA 1.0 dataset addresses a thriving topic of major interest of many researchers in different fields including (historical) document image analysis, image processing, pattern recognition and classification. The HBA 1.0 dataset and its ground truth can be used to evaluate the capabilities of image analysis methods to discriminate the textual content from the graphical ones on the one hand, and to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic) on the other hand. Evaluation results of a state-of-the-art pixel-labeling method on the HBA 1.0 dataset are reported and discussed in this paper in order to provide a benchmark/baseline for future evaluation studies and to showcase the intended use of the HBA 1.0 dataset.
Liste complète des métadonnées

Cited literature [14 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01637826
Contributor : Maroua Mehri <>
Submitted on : Saturday, November 18, 2017 - 8:56:13 AM
Last modification on : Wednesday, April 10, 2019 - 1:46:11 PM
Document(s) archivé(s) le : Monday, February 19, 2018 - 12:50:28 PM

File

MarouaMEHRI_HIP2017_Article-h....
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01637826, version 1

Citation

Maroua Mehri, Pierre Héroux, Rémy Mullot, Jean-Philippe Moreux, Bertrand Coüasnon, et al.. HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis. International Workshop on Historical Document Imaging and Processing (HIP), Nov 2017, Kyoto, Japan. ⟨hal-01637826⟩

Share

Metrics

Record views

785

Files downloads

171