Retrieval Constraints and Word Frequency Distributions - A Log-logistic Model for IR - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Information Retrieval Journal Année : 2011

Retrieval Constraints and Word Frequency Distributions - A Log-logistic Model for IR

Résumé

We first present in this paper an analytical view of heuristic retrieval con- straints which yields simple tests to determine whether a retrieval function satisfies the constraints or not. We then review empirical findings on word frequency distributions and the central role played by burstiness in this context. This leads us to propose a for- mal definition of burstiness which can be used to characterize probability distributions with respect to this phenomenon. We then introduce the family of information-based IR models which naturally captures heuristic retrieval constraints when the underlying probability distribution is bursty and propose a new IR model within this family, based on the log-logistic distribution. The experiments we conduct on several collections il- lustrate the good behavior of the log-logistic IR model: It significantly outperforms the Jelinek-Mercer and Dirichlet prior language models on most collections we have used, with both short and long queries and for both the MAP and the precision at 10 documents. It also compares favorably to BM25 and has similar performance to classical DFR models such as InL2 and PL2.
Fichier principal
Vignette du fichier
Clinchant-InformationRetrievalJournal2011.pdf (625.2 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00742020 , version 1 (15-10-2012)

Identifiants

Citer

Stéphane Clinchant, Éric Gaussier. Retrieval Constraints and Word Frequency Distributions - A Log-logistic Model for IR. Information Retrieval Journal, 2011, 14 (1), pp.5-25. ⟨10.1007/s10791-010-9143-7⟩. ⟨hal-00742020⟩
151 Consultations
426 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More