A Document Frequency Constraint for Pseudo-Relevance Feedback Models

Abstract : We study in this paper the behavior of several PRF models, and display their main characteristics. This will lead us to introduce a new heuristic constraint for PRF models, referred to as the Document Frequency (DF) constraint. We then analyze, from a theoretical point of view, state-of-the-art PRF models according to their relation with this constraint. This analysis reveals that the standard mixture model for PRF in the language modeling family does not satisfy the DF constraint. We then conduct a series of experiments in order to see whether the DF constraint is valid or not. To do so, we performed tests with an oracle and a simple family of tf-idf functions based on a prameter k controlling the convexity/concavity of the function. Both the oracle and the results obtained with this family of functions validate the DF constraint.
Type de document :
Communication dans un congrès
CORIA 2011 - COnférence en Recherche d'Information et Applications, Mar 2011, Avignon, France. pp.73-88, 2011
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-00744097
Contributeur : Eric Gaussier <>
Soumis le : lundi 22 octobre 2012 - 12:15:09
Dernière modification le : mardi 28 octobre 2014 - 18:34:07
Document(s) archivé(s) le : mercredi 23 janvier 2013 - 03:36:55

Fichier

Clinchant-Gaussier_CORIA11_web...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00744097, version 1

Collections

Citation

Stéphane Clinchant, Éric Gaussier. A Document Frequency Constraint for Pseudo-Relevance Feedback Models. CORIA 2011 - COnférence en Recherche d'Information et Applications, Mar 2011, Avignon, France. pp.73-88, 2011. <hal-00744097>

Partager

Métriques

Consultations de
la notice

162

Téléchargements du document

138