PADIC: extension and new experiments - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

PADIC: extension and new experiments

Résumé

PADIC is a multidialectal parallel Arabic corpus. It was composed initially by five Arabic dialects, three from the Maghreb and two from the Middle East, in addition to standard Arabic. In this paper, we present an augmented version of PADIC with a Moroccan dialect. We give also an evaluation, using the σ–index, of the computerization level of the Arabic dialects present in PADIC which reveals that these languages are really under-resourced. Several experiments in machine translation, in both sides between all the combinations of language pairs, are discussed too. For each language, we interpolated the corresponding Language Model (LM) with a large Arabic corpus based LM. The results show that this interpolation is in some cases without effect on the performances of translation systems and in others is rather penalizing.
Fichier principal
Vignette du fichier
main1.pdf (85.8 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01718858 , version 1 (27-02-2018)

Identifiants

  • HAL Id : hal-01718858 , version 1

Citer

K. Meftouh, S Harrat, Kamel Smaïli. PADIC: extension and new experiments. 7th International Conference on Advanced Technologies ICAT, Apr 2018, Antalya, Turkey. ⟨hal-01718858⟩
315 Consultations
282 Téléchargements

Partager

Gmail Facebook X LinkedIn More