Exploring term proximity statistic for Arabic information retrieval

Abstract : Term proximity statistic, which consists of rewarding documents where the matched query terms occur in close proximity, has proved its effectiveness in document retrieval performance. However, this field of research remains unexplored for Arabic information retrieval (IR) despite of the non diacritical text and the rich morphology of Arabic language which complicate the retrieval process. In this paper, we propose to boost the Arabic information retrieval performance by using proximity information. Our aim is to evaluate proximity features for Arabic language in order to go beyond the bag-of-words, and to overcome the problems related to text preprocessing. We investigate several state-of-the-art proximity models, including the Cross-Term model (CRTER), the Markov Random Field model (MRF), the divergence from randomness (DFR) multinomial model, and the Positional Language Model (PLM). For preprocessing purposes, Khoja and light stemming algorithms have been used. Experiments are performed on the Arabic TREC-2001/2002 collection using Terrier IR platform. The obtained results show significant improvements by using proximity based-models for Arabic IR.
Type de document :
Autre publication
Information Science and Technology (CIST), 2014 Third IEEE International Colloquium, 20-22 Oct, p.. 2014, 〈10.1109/CIST.2014.7016631〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01109471
Contributeur : Abdelkader El Mahdaouy <>
Soumis le : lundi 26 janvier 2015 - 13:56:50
Dernière modification le : jeudi 11 octobre 2018 - 08:48:04

Identifiants

Collections

Citation

Abdelkader El Mahdaouy, Eric Gaussier, Saïd El Alaoui Ouatik. Exploring term proximity statistic for Arabic information retrieval. Information Science and Technology (CIST), 2014 Third IEEE International Colloquium, 20-22 Oct, p.. 2014, 〈10.1109/CIST.2014.7016631〉. 〈hal-01109471〉

Partager

Métriques

Consultations de la notice

134