Skip to Main content Skip to Navigation
Conference papers

Direct vs. indirect evaluation of distributional thesauri

Vincent Claveau 1 Ewa Kijak 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : With the success of word embedding methods in various Natural Language Processing tasks, all the fields of distributional semantics have experienced a renewed interest. Beside the famous word2vec, recent studies have presented efficient techniques to build distributional thesaurus; in particular, Claveau et al. (2014) have already shown that Information Retrieval (IR) tools and concepts can be successfully used to build a thesaurus. In this paper, we address the problem of the evaluation of such thesauri or embedding models. Several evaluation scenarii are considered: direct evaluation through reference lexicons and specially crafted datasets, and indirect evaluation through a third party tasks, namely lexical subsitution and Information Retrieval. For this latter task, we adopt the query expansion framework proposed by Claveau and Kijak (2016). Through several experiments, we first show that the recent techniques for building distributional thesaurus outperform the word2vec approach, whatever the evaluation scenario. We also highlight the differences between the evaluation scenarii, which may lead to very different conclusions when comparing distributional models. Last, we study the effect of some parameters of the distributional models on these various evaluation scenarii.
Complete list of metadata

Cited literature [44 references]  Display  Hide  Download
Contributor : Vincent Claveau Connect in order to contact the contributor
Submitted on : Wednesday, November 9, 2016 - 4:43:34 PM
Last modification on : Wednesday, November 3, 2021 - 4:32:36 AM
Long-term archiving on: : Wednesday, March 15, 2017 - 4:10:36 AM


Files produced by the author(s)


  • HAL Id : hal-01394739, version 1


Vincent Claveau, Ewa Kijak. Direct vs. indirect evaluation of distributional thesauri. International Conference on Computational Linguistics, COLING, Dec 2016, Osaka, Japan. ⟨hal-01394739⟩



Record views


Files downloads