RASAM - A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi - Archive ouverte HAL Accéder directement au contenu
Chapitre D'ouvrage Année : 2021

RASAM - A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi

Chahan Vidal-Gorène
Clément Salah
Aliénor Decours-Perez
  • Fonction : Auteur
Boris Dupin
  • Fonction : Auteur

Résumé

The Arabic scripts raise numerous issues in text recognition and layout analysis. To overcome these, several datasets and methods have been proposed in recent years. Although the latter are focused on common scripts and layout, many Arabic writings and written traditions remain under-resourced. We therefore propose a new dataset comprising 300 images representative of the handwritten production of the Arabic Maghrebi scripts. This dataset is the achievement of a collaborative work undertaken in the first quarter of 2021, and it offers several levels of annotation and transcription. The article intends to shed light on the specificities of these writing and manuscripts, as well as highlight the challenges of the recognition. The collaborative tools used for the creation of the dataset are assessed and the dataset itself is evaluated with state of the art methods in layout analysis. The word-based text recognition method used and experimented on for these writings achieves CER of 4.8% on average. The pipeline described constitutes an experience feedback for the quick creation of data and the training of effective HTR systems for Arabic scripts and non-Latin scripts in general.
Fichier principal
Vignette du fichier
RASAM_A_Dataset_for_the_Recognition_and.pdf (4.99 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

halshs-03430697 , version 1 (16-11-2021)

Identifiants

Citer

Chahan Vidal-Gorène, Noëmie Lucas, Clément Salah, Aliénor Decours-Perez, Boris Dupin. RASAM - A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi. Document Analysis and Recognition – ICDAR 2021 Workshops, 12916, Springer International Publishing, pp.265-281, 2021, Lecture Notes in Computer Science, ⟨10.1007/978-3-030-86198-8_19⟩. ⟨halshs-03430697⟩
208 Consultations
143 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More