SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks

Résumé

Today's smartphones are able to capture documents with a good and simple way as any personal scanners. The captured document images need to be processed by specific and automated document processing systems. The systems are dedicated to textual content analysis, indexing and recognition. For instance, they may be used for font identification, writer identification and word or line segmentation. The state-of-the-art works lack comprehensive database for Arabic document images which are captured by mobile phones. This paper presents the first public offline images database for both printed and handwriting Arabic mobile captured documents, named "SmartATID". The document images of the database are acquired under varying capture conditions (blur, perspective angles and light). This causes photometric and geometric distortions that influence the performance of OCR process but also the page segmentation in lines and paragraphs. Each document image of our database is provided with a ground truth file that contains the exact text transcription and all numerical capture parameters used for each image capture. The database is freely and publicly usable by the research community at the following address http:// sites.google.com/site/smartatid.
Fichier principal
Vignette du fichier
ICFHR_2016_submission_41.pdf (606.63 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01403764 , version 1 (27-11-2016)

Licence

Domaine public

Identifiants

  • HAL Id : hal-01403764 , version 1

Citer

Fatma Chabchoub, Yousri Kessentini, Slim Kanoun, Véronique Eglin. SmartATID: A mobile captured Arabic Text Images Dataset for multi-purpose recognition tasks. Internation Conference in Frontiers on Handwriting Recognition, Oct 2016, Shenzhen, China. ⟨hal-01403764⟩
509 Consultations
835 Téléchargements

Partager

Gmail Facebook X LinkedIn More