SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks

Abstract : Today's smartphones are able to capture documents with a good and simple way as any personal scanners. The captured document images need to be processed by specific and automated document processing systems. The systems are dedicated to textual content analysis, indexing and recognition. For instance, they may be used for font identification, writer identification and word or line segmentation. The state-of-the-art works lack comprehensive database for Arabic document images which are captured by mobile phones. This paper presents the first public offline images database for both printed and handwriting Arabic mobile captured documents, named "SmartATID". The document images of the database are acquired under varying capture conditions (blur, perspective angles and light). This causes photometric and geometric distortions that influence the performance of OCR process but also the page segmentation in lines and paragraphs. Each document image of our database is provided with a ground truth file that contains the exact text transcription and all numerical capture parameters used for each image capture. The database is freely and publicly usable by the research community at the following address http:// sites.google.com/site/smartatid.
Complete list of metadatas

Cited literature [8 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01403764
Contributor : Véronique Eglin <>
Submitted on : Sunday, November 27, 2016 - 4:30:56 PM
Last modification on : Sunday, June 2, 2019 - 10:24:02 AM
Long-term archiving on : Tuesday, March 21, 2017 - 3:12:11 AM

File

ICFHR_2016_submission_41.pdf
Files produced by the author(s)

Licence


Public Domain

Identifiers

  • HAL Id : hal-01403764, version 1

Citation

Fatma Chabchoub, Yousri Kessentini, Slim Kanoun, Véronique Eglin. SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks. Internation Conference in Frontiers on Handwriting Recognition, Oct 2016, Shenzhen, China. ⟨hal-01403764⟩

Share

Metrics

Record views

452

Files downloads

674