Improving distributional thesauri by exploring the graph of neighbors

Vincent Claveau 1 Ewa Kijak 1 Olivier Ferret 2
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
2 LVIC - Laboratoire Vision et Ingénierie des Contenus
DIASI - Département Intelligence Ambiante et Systèmes Interactifs : DRT/LIST/DIASI
Abstract : In this paper, we address the issue of building and improving a distributional thesaurus. We first show that existing tools from the information retrieval domain can be directly used in order to build a thesaurus with state-of-the-art performance. Secondly, we focus more specifically on improving the obtained thesaurus, seen as a graph of k-nearest neighbors. By exploiting information about the neighborhood contained in this graph, we propose several contributions. 1) We show how the lists of neighbors can be globally improved by examining the reciprocity of the neighboring relation, that is, the fact that a word can be close of another and vice-versa. 2) We also propose a method to associate a confidence score to any lists of nearest neighbors (i.e. any entry of the thesaurus). 3) Last, we demonstrate how these confidence scores can be used to reorder the closest neighbors of a word. These different contributions are validated through experiments and offer significant improvement over the state-of-the-art.
Document type :
Conference papers
Complete list of metadatas

Cited literature [37 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01027545
Contributor : Vincent Claveau <>
Submitted on : Tuesday, July 22, 2014 - 11:12:42 AM
Last modification on : Thursday, February 7, 2019 - 4:45:26 PM
Long-term archiving on : Tuesday, November 25, 2014 - 10:32:00 AM

File

Claveau_Kijak_Ferret_COLING14....
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01027545, version 1

Citation

Vincent Claveau, Ewa Kijak, Olivier Ferret. Improving distributional thesauri by exploring the graph of neighbors. International Conference on Computational Linguistics, COLING 2014, Aug 2014, Dublin, Ireland. 12 p. ⟨hal-01027545⟩

Share

Metrics

Record views

536

Files downloads

299