Addressing Different Evaluation Environments for Information Retrieval through Pivot Systems
Résumé
Classical evaluations of Information Retrieval systems, under the Cranfield Paradigm, compare several systems within one evaluation environment, defined by its settings (document collection, topics, assessments and evaluation measures). In this paper, we propose a framework to handle the comparison of systems across several evaluation environments. To achieve this goal, we investigate the use of pivot systems, allowing an indirect comparison of systems across evaluation environments by computing Result Deltas, i.e. the differences between their evaluation measures values. We detail the proposed pivot-based methodology, define a pivot characteristics and present experiments to validate our proposal (and in particular the pivot characteristics). We create altered environments that differ from their topic sets using the 2018 and 2020 CLEF eHealth evaluation campaigns (Goeuriot et al., 2020). We explore the behaviour of the metrics and pivots measuring the correlation between the result deltas, and the ranking of systems through the pivots compared to the official ranking of the systems. Our experiment show that correlations can greatly vary according to the chosen pivot and metric. We show that some pivot/metric pairs achieve high correlation values across the altered environments, with a ranking of systems similar to the official ranking.
Domaines
Recherche d'information [cs.IR]
Origine : Fichiers produits par l'(les) auteur(s)