Combining compound and single terms under language model framework

Abstract : Most existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. To go beyond this assumption and thereby capture the semantics of document and query more accurately, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. However, in these models all bigrams/n-grams are considered and weighted uniformly. In this paper we introduce a new approach to select and weight relevant n-grams associated with a document. Experimental results on three TREC test collections showed an improvement over three strongest state-of-the-art model baselines, which are the original unigram language model, the Markov Random Field model, and the positional language model.
Complete list of metadatas

Cited literature [41 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01282933
Contributor : Open Archive Toulouse Archive Ouverte (oatao) <>
Submitted on : Friday, March 4, 2016 - 4:09:03 PM
Last modification on : Friday, January 10, 2020 - 9:09:21 PM
Long-term archiving on: Sunday, June 5, 2016 - 10:42:16 AM

File

hammache_14739.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Arezki Hammache, Mohand Boughanem, Rachid Ahmed-Ouamar. Combining compound and single terms under language model framework. Knowledge and Information Systems (KAIS), Springer, 2014, vol. 39 (n° 2), pp. 329-349. ⟨10.1007/s10115-013-0618-x⟩. ⟨hal-01282933⟩

Share

Metrics

Record views

179

Files downloads

310