POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

Abstract : Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its close-ness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (∼20% absolute improvement over an MSA tagger baseline).
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [41 references]  Display  Hide  Download

Contributor : Alexis Nasr <>
Submitted on : Wednesday, February 15, 2017 - 5:23:50 PM
Last modification on : Monday, March 4, 2019 - 2:04:14 PM
Document(s) archivé(s) le : Tuesday, May 16, 2017 - 12:13:10 PM


Publisher files allowed on an open archive




Ahmed Hamdi, Alexis Nasr, Nizar Habash, Núria Gala. POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools. Workshop on Arabic Natural Language Processing, Jul 2015, Beijing, China. pp.59 - 68, ⟨10.18653/v1/W15-3207⟩. ⟨hal-01464860⟩



Record views


Files downloads