HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

An Enhanced Corpus for Arabic Newspapers Comments

Abstract : In this paper, we propose our enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments. The developed approach has to enhance an existing approach by the enrichment of the available corpus and the inclusion of the annotation step by following the Model Annotate Train Test Evaluate Revise (MATTER) approach. A corpus is created by collecting comments from web sites of three well know Algerian newspapers. Three classifiers, support vector machines, naïve Bayes, and k-nearest neighbors, were used for classification of comments into positive and negative classes. To identify the influence of the stemming in the obtained results, the classification was tested with and without stemming. Obtained results show that stemming does not enhance considerably the classification due to the nature of Algerian comments tied to Algerian Arabic Dialect. The promising results constitute a motivation for us to improve our approach especially in dealing with non Arabic sentences, especially Dialectal and French ones.
Complete list of metadata

Contributor : Mahieddine Djoudi Connect in order to contact the contributor
Submitted on : Saturday, February 6, 2021 - 8:21:29 PM
Last modification on : Wednesday, December 22, 2021 - 9:06:02 AM
Long-term archiving on: : Friday, May 7, 2021 - 6:00:55 PM


Files produced by the author(s)




Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi. An Enhanced Corpus for Arabic Newspapers Comments. International Arab Journal of Information Technology, Colleges of Computing and Information Society (CCIS), 2020, 17 (5), pp.789-798. ⟨10.34028/iajit/17/5/12⟩. ⟨hal-03124728⟩



Record views


Files downloads