POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

Ahmed Hamdi; Alexis Nasr; Nizar Habash; Núria Gala

doi:10.18653/v1/W15-3207

Communication Dans Un Congrès Année : 2015

POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

(1) , (1, 2) , (3) , (4)

1
2
3
4

Ahmed Hamdi

Fonction : Auteur
PersonId : 770140
IdHAL : ahmed-hamdi
ORCID : 0000-0002-8964-2135

Laboratoire d'informatique Fondamentale de Marseille

Alexis Nasr

Fonction : Auteur
PersonId : 4991
IdHAL : alexis-nasr
IdRef : 120694220

Laboratoire d'informatique Fondamentale de Marseille

Traitement Automatique du Langage Ecrit et Parlé

Nizar Habash

Fonction : Auteur

Center for Computational Learning Systems

Núria Gala

Fonction : Auteur
PersonId : 18582
IdHAL : nuria-gala-pavia
ORCID : 0000-0003-2987-0723
IdRef : 075172763

Laboratoire d'informatique Fondamentale de Marseille - UMR 6166

Résumé

Developing natural language processing tools usually requires a large number of resources (lexica, annotated corpora, etc.), which often do not exist for less-resourced languages. One way to overcome the problem of lack of resources is to devote substantial efforts to build new ones from scratch. Another approach is to exploit existing resources of closely related languages. In this paper, we focus on developing a part-of-speech tagger for the Tunisian Arabic dialect (TUN), a low-resource language, by exploiting its close-ness to Modern Standard Arabic (MSA), which has many state-of-the-art resources and tools. Our system achieved an accuracy of 89% (∼20% absolute improvement over an MSA tagger baseline).

Domaines

Traitement du texte et du document

Fichier principal

W15-3207.pdf (172.72 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Alexis Nasr : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01464860

Soumis le : mercredi 15 février 2017-17:23:50

Dernière modification le : vendredi 22 mars 2024-18:24:04

Archivage à long terme le : mardi 16 mai 2017-12:13:10

Dates et versions

hal-01464860 , version 1 (15-02-2017)

Identifiants

HAL Id : hal-01464860 , version 1
DOI : 10.18653/v1/W15-3207

Citer

Ahmed Hamdi, Alexis Nasr, Nizar Habash, Núria Gala. POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools. Workshop on Arabic Natural Language Processing, Jul 2015, Beijing, China. pp.59 - 68, ⟨10.18653/v1/W15-3207⟩. ⟨hal-01464860⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLN LIF CNRS UNIV-AMU EC-MARSEILLE LIS-LAB

206 Consultations

263 Téléchargements

POS-tagging of Tunisian Dialect Using Standard Arabic Resources and Tools

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager