Direct vs. indirect evaluation of distributional thesauri

Vincent Claveau 1 Ewa Kijak 1
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : With the success of word embedding methods in various Natural Language Processing tasks, all the fields of distributional semantics have experienced a renewed interest. Beside the famous word2vec, recent studies have presented efficient techniques to build distributional thesaurus; in particular, Claveau et al. (2014) have already shown that Information Retrieval (IR) tools and concepts can be successfully used to build a thesaurus. In this paper, we address the problem of the evaluation of such thesauri or embedding models. Several evaluation scenarii are considered: direct evaluation through reference lexicons and specially crafted datasets, and indirect evaluation through a third party tasks, namely lexical subsitution and Information Retrieval. For this latter task, we adopt the query expansion framework proposed by Claveau and Kijak (2016). Through several experiments, we first show that the recent techniques for building distributional thesaurus outperform the word2vec approach, whatever the evaluation scenario. We also highlight the differences between the evaluation scenarii, which may lead to very different conclusions when comparing distributional models. Last, we study the effect of some parameters of the distributional models on these various evaluation scenarii.
Type de document :
Communication dans un congrès
International Conference on Computational Linguistics, COLING, Dec 2016, Osaka, Japan. Proceedings of the International Conference on Computational Linguistics, COLING, 2016
Liste complète des métadonnées


https://hal.archives-ouvertes.fr/hal-01394739
Contributeur : Vincent Claveau <>
Soumis le : mercredi 9 novembre 2016 - 16:43:34
Dernière modification le : mercredi 2 août 2017 - 10:08:36
Document(s) archivé(s) le : mercredi 15 mars 2017 - 04:10:36

Fichier

Claveau_Kijak_IR_COLING2016.pd...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01394739, version 1

Citation

Vincent Claveau, Ewa Kijak. Direct vs. indirect evaluation of distributional thesauri. International Conference on Computational Linguistics, COLING, Dec 2016, Osaka, Japan. Proceedings of the International Conference on Computational Linguistics, COLING, 2016. <hal-01394739>

Partager

Métriques

Consultations de
la notice

269

Téléchargements du document

209