Skip to Main content Skip to Navigation
Conference papers

Voting Classifier vs Deep learning method in Arabic Dialect Identification

Abstract : In this paper, we present three methods developed by the SORBONNE Team for the NADI shared task on Arabic Dialect Identification for tweets. The first and the second method use respectively a machine learning model based on a Voting Classifier with words and character level features and a deep learning model at the word level. The third method uses only character-level features. We explored different text representation such as TF-IDF (first model) and word embeddings (second model). The Voting Classifier was the most powerful prediction model, achieving the best macro-average F1 score of 18.8% and an accuracy of 36.54% on the official test. Our model ranked 9 on the challenge and in conclusion we propose some ideas to improve its results.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03089957
Contributor : Gaël Lejeune Connect in order to contact the contributor
Submitted on : Tuesday, December 29, 2020 - 7:26:47 AM
Last modification on : Friday, January 15, 2021 - 3:32:52 AM
Long-term archiving on: : Tuesday, March 30, 2021 - 6:05:59 PM

File

coling2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03089957, version 1

Citation

Dhaou Ghoul, Gaël Lejeune. Voting Classifier vs Deep learning method in Arabic Dialect Identification. : Proceedings of the Fifth Arabic Natural Language Processing Workshop, COLING 2020, Dec 2020, Barcelone, Spain. ⟨hal-03089957⟩

Share

Metrics

Record views

23

Files downloads

62