Skip to Main content Skip to Navigation
New interface
Conference papers

Voting Classifier vs Deep learning method in Arabic Dialect Identification

Abstract : In this paper, we present three methods developed by the SORBONNE Team for the NADI shared task on Arabic Dialect Identification for tweets. The first and the second method use respectively a machine learning model based on a Voting Classifier with words and character level features and a deep learning model at the word level. The third method uses only character-level features. We explored different text representation such as TF-IDF (first model) and word embeddings (second model). The Voting Classifier was the most powerful prediction model, achieving the best macro-average F1 score of 18.8% and an accuracy of 36.54% on the official test. Our model ranked 9 on the challenge and in conclusion we propose some ideas to improve its results.
Document type :
Conference papers
Complete list of metadata
Contributor : Gaël Lejeune Connect in order to contact the contributor
Submitted on : Tuesday, December 29, 2020 - 7:26:47 AM
Last modification on : Thursday, December 9, 2021 - 3:48:14 AM
Long-term archiving on: : Tuesday, March 30, 2021 - 6:05:59 PM


Files produced by the author(s)


  • HAL Id : hal-03089957, version 1


Dhaou Ghoul, Gaël Lejeune. Voting Classifier vs Deep learning method in Arabic Dialect Identification. : Proceedings of the Fifth Arabic Natural Language Processing Workshop, COLING 2020, Dec 2020, Barcelone, Spain. ⟨hal-03089957⟩



Record views


Files downloads