POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

Abstract : Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its close-ness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (∼20% absolute improvement over an MSA tagger baseline).
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [41 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01464860
Contributor : Alexis Nasr <>
Submitted on : Wednesday, February 15, 2017 - 5:23:50 PM
Last modification on : Monday, March 4, 2019 - 2:04:14 PM
Document(s) archivé(s) le : Tuesday, May 16, 2017 - 12:13:10 PM

File

W15-3207.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Ahmed Hamdi, Alexis Nasr, Nizar Habash, Núria Gala. POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools. Workshop on Arabic Natural Language Processing, Jul 2015, Beijing, China. pp.59 - 68, ⟨10.18653/v1/W15-3207⟩. ⟨hal-01464860⟩

Share

Metrics

Record views

151

Files downloads

185