Skip to Main content Skip to Navigation
Journal articles

Arabic topic identification based on empirical studies of topic models

Abstract : This paper focuses on the topic identification for the Arabic language based on topic models. We study the Latent Dirichlet Allocation (LDA) as an unsupervised method for the Arabic topic identification. Thus, a deep study of LDA is carried out at two levels: Stemming process and the choice of LDA hyper-parameters. For the first level, we study the effect of different Arabic stemmers on LDA. For the second level, we focus on LDA hyper-parameters α and β and their impact on the topic identification. This study shows that LDA is an efficient method for Arabic topic identification especially with the right choice of hyper-parameters. Another important result is the high impact of the stemming algorithm on topic identification.
Document type :
Journal articles
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-01444574
Contributor : Marwa Naili Connect in order to contact the contributor
Submitted on : Friday, July 28, 2017 - 2:31:15 PM
Last modification on : Wednesday, October 28, 2020 - 1:08:02 PM

File

ARIMA-Vol27-45-59.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NoDerivatives 4.0 International License

Identifiers

Collections

Citation

Marwa Naili, Anja Habacha Chaibi, Henda Ben Ghézala. Arabic topic identification based on empirical studies of topic models. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, INRIA, 2017, Volume 27 - 2017 - Special issue CARI 2016, ⟨10.46298/arima.3102⟩. ⟨hal-01444574v2⟩

Share

Metrics

Record views

213

Files downloads

1139