HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis

Abstract : This paper introduces HBA 1.0, a representative pixel-based annotated dataset which is released at the IC-DAR2017 Competition on Historical Book Analysis (HBA2017). The HBA 1.0 dataset is composed of 4,436 real scanned ground truthed historical document images from 11 books (5 manuscripts and 6 printed books) in different languages and scripts published between the 13 th and 19 th centuries. The HBA 1.0 dataset contains 2,435 and 2,001 manuscript and printed pages, respectively. The ground truth of the HBA 1.0 dataset contains more than 7,58 billion annotated pixels. The HBA 1.0 dataset addresses a thriving topic of major interest of many researchers in different fields including (historical) document image analysis, image processing, pattern recognition and classification. The HBA 1.0 dataset and its ground truth can be used to evaluate the capabilities of image analysis methods to discriminate the textual content from the graphical ones on the one hand, and to separate the textual content according to different text fonts (e.g. lowercase, uppercase, italic) on the other hand. Evaluation results of a state-of-the-art pixel-labeling method on the HBA 1.0 dataset are reported and discussed in this paper in order to provide a benchmark/baseline for future evaluation studies and to showcase the intended use of the HBA 1.0 dataset.
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download

Contributor : Maroua Mehri Connect in order to contact the contributor
Submitted on : Saturday, November 18, 2017 - 8:56:13 AM
Last modification on : Thursday, May 12, 2022 - 3:38:49 PM
Long-term archiving on: : Monday, February 19, 2018 - 12:50:28 PM


Files produced by the author(s)


  • HAL Id : hal-01637826, version 1


Maroua Mehri, Pierre Héroux, Rémy Mullot, Jean-Philippe Moreux, Bertrand Coüasnon, et al.. HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis. International Workshop on Historical Document Imaging and Processing (HIP), Nov 2017, Kyoto, Japan. ⟨hal-01637826⟩



Record views


Files downloads