POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

Abstract : Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its close-ness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (∼20% absolute improvement over an MSA tagger baseline).
Type de document :
Communication dans un congrès
Workshop on Arabic Natural Language Processing, Jul 2015, Beijing, China. Proceedings of the Second Workshop on Arabic Natural Language Processing, pp.59 - 68, 2015, 〈10.18653/v1/W15-3207〉
Liste complète des métadonnées

Littérature citée [41 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01464860
Contributeur : Alexis Nasr <>
Soumis le : mercredi 15 février 2017 - 17:23:50
Dernière modification le : samedi 28 avril 2018 - 01:01:43
Document(s) archivé(s) le : mardi 16 mai 2017 - 12:13:10

Fichier

W15-3207.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Ahmed Hamdi, Alexis Nasr, Nizar Habash, Núria Gala. POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools. Workshop on Arabic Natural Language Processing, Jul 2015, Beijing, China. Proceedings of the Second Workshop on Arabic Natural Language Processing, pp.59 - 68, 2015, 〈10.18653/v1/W15-3207〉. 〈hal-01464860〉

Partager

Métriques

Consultations de la notice

144

Téléchargements de fichiers

160