Ground-Truth Production and Benchmarking Scenarios Creation with DocMining
Résumé
In this paper we present the DocMining platform and its application to ground-truth datasets production and page segmentation evaluation. DocMining is a highly modular framework dedicated to document interpretation where document processing tasks are modelized with scenarios. We present here two scenarios which use PDF documents, found on the web or produced from XML files, as basis of the ground-truth dataset.
Domaines
Traitement du texte et du document
Origine : Fichiers produits par l'(les) auteur(s)
Loading...