Skip to Main content Skip to Navigation
Conference papers

Indiscriminateness in representation spaces of terms and documents

Vincent Claveau 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : Examining the properties of representation spaces for documents or words in Information Retrieval (IR) – typically R n with n large – brings precious insights to help the retrieval process. Recently, several authors have studied the real dimensionality of the datasets, called intrinsic dimensionality, in specific parts of these spaces [14]. They have shown that this dimensionality is chiefly tied with the notion of in-discriminateness among neighbors of a query point in the vector space. In this paper, we propose to revisit this notion in the specific case of IR. More precisely, we show how to estimate indiscriminateness from IR similarities in order to use it in representation spaces used for documents and words [18, 7]. We show that indiscriminateness may be used to characterize difficult queries; moreover we show that this notion, applied to word embeddings, can help to choose terms to use for query expansion.
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01859568
Contributor : Vincent Claveau <>
Submitted on : Wednesday, August 22, 2018 - 11:58:18 AM
Last modification on : Tuesday, February 25, 2020 - 8:08:12 AM
Document(s) archivé(s) le : Friday, November 23, 2018 - 2:46:34 PM

File

Claveau_ECIR2018.pdf
Files produced by the author(s)

Identifiers

Citation

Vincent Claveau. Indiscriminateness in representation spaces of terms and documents. ECIR 2018 - 40th European Conference in Information Retrieval, Mar 2018, Grenoble, France. pp.251-262, ⟨10.1007/978-3-319-76941-7_19⟩. ⟨hal-01859568⟩

Share

Metrics

Record views

174

Files downloads

251