SmartDoc-QA: A Dataset for Quality Assessment of Smartphone Captured Document Images - Single and Multiple Distortions
Résumé
Smartphones are enabling new ways of capture,
hence arises the need for seamless and reliable acquisition and
digitization of documents. The quality assessment step is an
important part of both the acquisition and the digitization
processes. Assessing document quality could aid users during the
capture process or help improve image enhancement methods
after a document has been captured. Current state-of-the-art
works lack databases in the field of document image quality
assessment. In order to provide a baseline benchmark for quality
assessment methods for mobile captured documents, we present
in this paper a dataset for quality assessment that contains both
singly- and multiply-distorted document images.
The proposed dataset could be used for benchmarking quality
assessment methods by the objective measure of OCR accuracy,
and could be also used to benchmark quality enhancement
methods. There are three types of documents in the dataset:
modern documents, old administrative letters and receipts. The
document images of the dataset are captured under varying
capture conditions (light, different types of blur and perspective
angles). This causes geometric and photometric distortions that
hinder the OCR process. The ground truth of the dataset
images consists of the text transcriptions of the documents,
the OCR results of the captured documents and the values of
the different capture parameters used for each image. We also
present how the dataset could be used for evaluation in the
field of no-reference quality assessment. The dataset is freely
and publicly available for use by the research community at
http://navidomass.univ-lr.fr/SmartDoc-QA.