DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images

Abstract : Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools.
Type de document :
Article dans une revue
Journal of imaging, MDPI, 2017, 3 (4), 〈10.3390/jimaging3040062〉
Liste complète des métadonnées

Littérature citée [67 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01668915
Contributeur : Nicholas Journet <>
Soumis le : mercredi 20 décembre 2017 - 13:29:00
Dernière modification le : jeudi 11 janvier 2018 - 06:20:17

Fichier

jimaging.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Nicholas Journet, Muriel Visani, Boris Mansencal, Kieu Van-Cuong, Antoine Billy. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of imaging, MDPI, 2017, 3 (4), 〈10.3390/jimaging3040062〉. 〈hal-01668915〉

Partager

Métriques

Consultations de la notice

16

Téléchargements de fichiers

6