HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Improving Arabic information retrieval using word embedding similarities

Abstract : Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models.
Document type :
Journal articles
Complete list of metadata

Contributor : Abdelkader El Mahdaouy Connect in order to contact the contributor
Submitted on : Monday, February 12, 2018 - 1:38:47 AM
Last modification on : Wednesday, January 12, 2022 - 9:58:02 AM


  • HAL Id : hal-01706531, version 1



Abdelkader El Mahdaouy, Said El Alaoui Ouatik, Eric Gaussier. Improving Arabic information retrieval using word embedding similarities. International Journal of Speech Technology, Springer Verlag, 2018, 21 (1), pp.121-136. ⟨hal-01706531⟩



Record views