LIA at TREC 2012 Web Track: Unsupervised Search Concepts Identification from General Sources of Information

Abstract : In this paper, we report the experiments we conducted for our participation to the TREC 2012 Web Track. We experimented a brand new system that models the latent concepts underlying a query. We use Latent Dirichlet Allocation (LDA), a gener-ative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without any prior training step. These concepts are incorporated into the ranking function with the aim of promoting documents that refer to many different query-related thematics. We also explored the use of different types of sources of information for modeling the latent concepts. For this purpose, we use four general sources of information of various nature (web, news, encyclopedic) from which the feedback documents are extracted.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01314983
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Thursday, May 12, 2016 - 2:39:07 PM
Last modification on : Tuesday, April 2, 2019 - 2:03:39 AM

Identifiers

  • HAL Id : hal-01314983, version 1

Citation

Romain Deveaud, Eric Sanjuan, Patrice Bellot. LIA at TREC 2012 Web Track: Unsupervised Search Concepts Identification from General Sources of Information. TREC 2012, Nov 2012, Gaithersburg, United States. ⟨hal-01314983⟩

Share

Metrics

Record views

121