Skip to Main content Skip to Navigation
Conference papers

Direct vs. indirect evaluation of distributional thesauri

Vincent Claveau 1 Ewa Kijak 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : With the success of word embedding methods in various Natural Language Processing tasks, all the fields of distributional semantics have experienced a renewed interest. Beside the famous word2vec, recent studies have presented efficient techniques to build distributional thesaurus; in particular, Claveau et al. (2014) have already shown that Information Retrieval (IR) tools and concepts can be successfully used to build a thesaurus. In this paper, we address the problem of the evaluation of such thesauri or embedding models. Several evaluation scenarii are considered: direct evaluation through reference lexicons and specially crafted datasets, and indirect evaluation through a third party tasks, namely lexical subsitution and Information Retrieval. For this latter task, we adopt the query expansion framework proposed by Claveau and Kijak (2016). Through several experiments, we first show that the recent techniques for building distributional thesaurus outperform the word2vec approach, whatever the evaluation scenario. We also highlight the differences between the evaluation scenarii, which may lead to very different conclusions when comparing distributional models. Last, we study the effect of some parameters of the distributional models on these various evaluation scenarii.
Complete list of metadatas

Cited literature [44 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01394739
Contributor : Vincent Claveau <>
Submitted on : Wednesday, November 9, 2016 - 4:43:34 PM
Last modification on : Tuesday, March 10, 2020 - 3:23:08 PM
Document(s) archivé(s) le : Wednesday, March 15, 2017 - 4:10:36 AM

File

Claveau_Kijak_IR_COLING2016.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01394739, version 1

Citation

Vincent Claveau, Ewa Kijak. Direct vs. indirect evaluation of distributional thesauri. International Conference on Computational Linguistics, COLING, Dec 2016, Osaka, Japan. ⟨hal-01394739⟩

Share

Metrics

Record views

417

Files downloads

310