Skip to Main content Skip to Navigation
Conference papers

Theme Classification of Arabic Text: A Statistical Approach

Abstract : The huge amount of textual documents that is stored in a lot of domains continues to increase at high speed; there is a need to organize it in the right manner so that a user can access it very easily. Text-Mining tools help to process this growing big data and to reveal the important information embedded in those documents. However, the field of information retrieval in the Arabic language is relatively new and limited compared to the quantity of research works that have been done in other languages (eg. English, Greek, German, Chinese ...). In this paper, we propose two statistical approaches of text classification by theme, which are dedicated to the Arabic language. The tests of evaluation are conducted on an Arabic textual corpus containing 5 different themes: Economics, Politics, Sport, Medicine and Religion. This investigation has validated several text mining tools for the Arabic language and has shown that the two proposed approaches are interesting in Arabic theme classification (classification performance reaching the score of 95%).
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download
Contributor : Hélène Lowinger Connect in order to contact the contributor
Submitted on : Friday, June 13, 2014 - 2:07:11 PM
Last modification on : Tuesday, July 9, 2019 - 5:00:51 PM
Long-term archiving on: : Saturday, September 13, 2014 - 11:12:32 AM


Files produced by the author(s)


  • HAL Id : hal-01005873, version 1



Leila Fodil, Halim Sayoud, Siham Ouamour. Theme Classification of Arabic Text: A Statistical Approach. Terminology and Knowledge Engineering 2014, Jun 2014, Berlin, Germany. 10 p. ⟨hal-01005873⟩



Record views


Files downloads