Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments

Salima Mdhaffar; Fethi Bougares; Yannick Estève; Lamia Hadrich-Belguith

doi:10.18653/v1/W17-1307

Communication Dans Un Congrès Année : 2017

Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments

(1) , (1) , (1) , (2)

1
2

Salima Mdhaffar

Fonction : Auteur
PersonId : 17127
IdHAL : salima-mdhaffar
ORCID : 0000-0002-8472-6890

Laboratoire d'Informatique de l'Université du Mans

Fethi Bougares

Fonction : Auteur
PersonId : 768825
IdRef : 170400883

Laboratoire d'Informatique de l'Université du Mans

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire d'Informatique de l'Université du Mans

Lamia Hadrich-Belguith

Fonction : Auteur

Multimedia, InfoRmation systems and Advanced Computing Laboratory

Résumé

Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we focus on SA of the Tunisian dialect. We use Machine Learning techniques to determine the polarity of comments written in Tunisian dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian dialect corpus of 17.000 comments from Facebook. This corpus shows a significant improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas

Domaines

Informatique et langage [cs.CL]

Fichier principal

eacl2017_VF.pdf (148.29 Ko)

Yannick Estève : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01592418

Soumis le : dimanche 11 mars 2018-09:40:21

Dernière modification le : mardi 23 juin 2020-12:30:04

Archivage à long terme le : mardi 12 juin 2018-12:19:32

Dates et versions

hal-01592418 , version 1 (11-03-2018)

Identifiants

HAL Id : hal-01592418 , version 1
DOI : 10.18653/v1/W17-1307

Citer

Salima Mdhaffar, Fethi Bougares, Yannick Estève, Lamia Hadrich-Belguith. Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments. Third Arabic Natural Language Processing Workshop (WANLP), Apr 2017, Valence, Spain. pp.55-61, ⟨10.18653/v1/W17-1307⟩. ⟨hal-01592418⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LEMANS LIUM LIUM-LST

863 Consultations

954 Téléchargements

Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager