Ancient documents bleed-through evaluation and its application for predicting OCR error rates - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2011

Ancient documents bleed-through evaluation and its application for predicting OCR error rates

Résumé

This article presents a way to evaluate the bleed-through defect on very old document images. We design measures to quantify and evaluate the verso ink bleeding through the paper onto the recto side. Measuring the bleed-through defect alows us to perform statistical analysis that are able to predict the feasibility of different post-scan tasks. In this article we choose to illustrate our measures by creating two OCR error rate predicting models based bleed-through evaluation. Two models are proposed, one for Abbyy FineReader ∗ which is a very power-full commercial OCR and OCRopus † which is sponsored by Google. Both prediction models appears to be very accurate when calculating various statistic indicators.
Fichier principal
Vignette du fichier
DRR2011_BleedThroughEvaluation.pdf (230.5 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00570247 , version 1 (28-02-2011)

Identifiants

  • HAL Id : hal-00570247 , version 1

Citer

Vincent Rabeux, Nicholas Journet, Jean-Philippe Domenger. Ancient documents bleed-through evaluation and its application for predicting OCR error rates. Document Recognition and retrieval, Jan 2011, San fransisco, United States. pp.78740Q. ⟨hal-00570247⟩

Collections

CNRS
74 Consultations
185 Téléchargements

Partager

Gmail Facebook X LinkedIn More