Ancient documents bleed-through evaluation and its application for predicting OCR error rates

Abstract : This article presents a way to evaluate the bleed-through defect on very old document images. We design measures to quantify and evaluate the verso ink bleeding through the paper onto the recto side. Measuring the bleed-through defect alows us to perform statistical analysis that are able to predict the feasibility of different post-scan tasks. In this article we choose to illustrate our measures by creating two OCR error rate predicting models based bleed-through evaluation. Two models are proposed, one for Abbyy FineReader ∗ which is a very power-full commercial OCR and OCRopus † which is sponsored by Google. Both prediction models appears to be very accurate when calculating various statistic indicators.
Type de document :
Communication dans un congrès
Document Recognition and retrieval, Jan 2011, San fransisco, United States. 7874, pp.78740Q, 2011
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00570247
Contributeur : Vincent Rabeux <>
Soumis le : lundi 28 février 2011 - 09:58:36
Dernière modification le : mercredi 29 novembre 2017 - 14:59:57
Document(s) archivé(s) le : mardi 6 novembre 2012 - 15:05:22

Fichier

DRR2011_BleedThroughEvaluation...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00570247, version 1

Collections

Citation

Vincent Rabeux, Nicholas Journet, Jean-Philippe Domenger. Ancient documents bleed-through evaluation and its application for predicting OCR error rates. Document Recognition and retrieval, Jan 2011, San fransisco, United States. 7874, pp.78740Q, 2011. 〈hal-00570247〉

Partager

Métriques

Consultations de la notice

158

Téléchargements de fichiers

146