Skip to Main content Skip to Navigation
Conference papers

Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects

Karima Abidi 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this article, we tackle the issue of sentiment analysis in three Maghrebi dialects used in social networks. More precisely, we are interested by analysing sentiments in Algerian, Moroccan and Tunisian corpora. To do this, we built automatically three lexicons of sentiments, one for each dialect. Each lexicon is composed of words with their polarities, a dialect word could be written in Arabic or in Latin scripts. These lexicons may include French or English words as well as words in Arabic dialect and standard Arabic. The semantic orientation of a word represented by an embedding vector is determined automatically by calculating its distance with several embedding seed words. The embedding vectors are trained on three large corpora collected from YouTube. The proposed approach is evaluated by using few existing annotated corpora in Tunisian and Moroccan dialects. For the Algerian dialect, in addition to a small corpus we found in the literature, we collected and annotated one composed of 10k comments extracted from Youtube. This corpus represents a valuable resource which is proposed for free.
Document type :
Conference papers
Complete list of metadata
Contributor : Kamel Smaïli Connect in order to contact the contributor
Submitted on : Thursday, July 29, 2021 - 5:13:17 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM


Files produced by the author(s)


  • HAL Id : hal-03308111, version 1




Karima Abidi, Kamel Smaïli. Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects. 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT), Sep 2021, Copenhagen, Denmark. ⟨hal-03308111⟩



Record views


Files downloads