Theme Classification of Arabic Text: A Statistical Approach

Abstract : The huge amount of textual documents that is stored in a lot of domains continues to increase at high speed; there is a need to organize it in the right manner so that a user can access it very easily. Text-Mining tools help to process this growing big data and to reveal the important information embedded in those documents. However, the field of information retrieval in the Arabic language is relatively new and limited compared to the quantity of research works that have been done in other languages (eg. English, Greek, German, Chinese ...). In this paper, we propose two statistical approaches of text classification by theme, which are dedicated to the Arabic language. The tests of evaluation are conducted on an Arabic textual corpus containing 5 different themes: Economics, Politics, Sport, Medicine and Religion. This investigation has validated several text mining tools for the Arabic language and has shown that the two proposed approaches are interesting in Arabic theme classification (classification performance reaching the score of 95%).
Type de document :
Communication dans un congrès
Terminology and Knowledge Engineering 2014, Jun 2014, Berlin, Germany. 10 p, 2014
Liste complète des métadonnées

Littérature citée [11 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01005873
Contributeur : Hélène Lowinger <>
Soumis le : vendredi 13 juin 2014 - 14:07:11
Dernière modification le : mercredi 9 juillet 2014 - 17:21:04
Document(s) archivé(s) le : samedi 13 septembre 2014 - 11:12:32

Fichier

Article1_Fodil_new7_Berlin_Cam...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01005873, version 1

Collections

Citation

Leila Fodil, Halim Sayoud, Siham Ouamour. Theme Classification of Arabic Text: A Statistical Approach. Terminology and Knowledge Engineering 2014, Jun 2014, Berlin, Germany. 10 p, 2014. 〈hal-01005873〉

Partager

Métriques

Consultations de la notice

236

Téléchargements de fichiers

844