CoST: An annotated Data Collection for Complex Search - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

CoST: An annotated Data Collection for Complex Search

Résumé

While great progress is made in the area of information access, there are still open issues that involve designing intelligent systems supporting task-based search. Despite the importance of task-based search, the information retrieval and information science communities still feel the lack of open-ended and annotated datasets that enable the evaluation of a number of related facets of search tasks in downstream applications. Existing datasets are either sampled from large-scale logs but provide poor annotations, or sampled from lower-scale user studies but focus on ranked list evaluation. In this work, we present CoST: a novel richly annotated dataset for evaluating complex search tasks, collaboratively designed by researchers from the computer science and cognitive psychology domains, and intended to answer a wide range of research questions dealing with task-based search. CoST includes 5667 queries recorded in 630 task-based sessions that result from a user study involving 70 french native participants who are expert in one among 3 different domains of expertise (computer science, medicine, psychology). Each participant completed 15 tasks with 5 different types of cognitive complexity (fact-finding, exploratory learning, decision-making, problem-solving, multicriteria-inferential). In addition to search data (e.g., queries and clicks), CoST provides task and session-related data, task annotations and query annotations. We illustrate possible usages of CoST through the evaluation of query classification models and the understanding of the effect of task complexity and domain on user's search behavior.
Fichier principal
Vignette du fichier
CoST_ An annotated Data Collection for Complex Search.pdf (318.42 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03885040 , version 1 (06-12-2022)

Identifiants

Citer

Cheyenne Dosso, Jose G. Moreno, Aline Chevalier, Lynda Tamine. CoST: An annotated Data Collection for Complex Search. 30th ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM Special Interest Group on Hypertext, Hypermedia and Web; ACM Special Interest Group on Information Retrieval, Oct 2021, Queensland (Virtual Event ), Australia. pp.4455-4464, ⟨10.1145/3459637.3481998⟩. ⟨hal-03885040⟩
62 Consultations
72 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More