Skip to Main content Skip to Navigation
Journal articles

Combining compound and single terms under language model framework

Abstract : Most existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model.
Complete list of metadata

Cited literature [41 references]  Display  Hide  Download
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Friday, March 4, 2016 - 4:09:03 PM
Last modification on : Wednesday, June 9, 2021 - 10:00:27 AM
Long-term archiving on: : Sunday, June 5, 2016 - 10:42:16 AM


Files produced by the author(s)



Arezki Hammache, Mohand Boughanem, Rachid Ahmed-Ouamar. Combining compound and single terms under language model framework. Knowledge and Information Systems (KAIS), Springer, 2014, vol. 39 (n° 2), pp. 329-349. ⟨10.1007/s10115-013-0618-x⟩. ⟨hal-01282933⟩



Record views


Files downloads