On metric embedding for boosting semantic similarity computations

Julien Subercaze; Christophe Gravier; Frédérique Laforest

Communication Dans Un Congrès Année : 2015

On metric embedding for boosting semantic similarity computations

(1) , (1) , (1)

Julien Subercaze

Fonction : Auteur
PersonId : 4313
IdHAL : julien-subercaze
IdRef : 156416662

Laboratoire Hubert Curien

Christophe Gravier

Fonction : Auteur
PersonId : 3235
IdHAL : cgravier
ORCID : 0000-0001-8586-6302
IdRef : 12599396X

Laboratoire Hubert Curien

Frédérique Laforest

Fonction : Auteur
PersonId : 2645
IdHAL : frederique-laforest
IdRef : 034878408

Laboratoire Hubert Curien

Résumé

Computing pairwise word semantic similarity is widely used and serves as a building block in many tasks in NLP. In this paper, we explore the embedding of the shortest-path metrics from a knowledge base (Wordnet) into the Hamming hyper-cube, in order to enhance the computation performance. We show that, although an isometric embedding is untractable, it is possible to achieve good non-isometric embeddings. We report a speedup of three orders of magnitude for the task of computing Leacock and Chodorow (LCH) similarity while keeping strong correlations (r = .819, ρ = .826).

Domaines

Base de données [cs.DB] Informatique et langage [cs.CL]

Fichier principal

aclfinal.pdf (798.98 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Julien Subercaze : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01166163

Soumis le : lundi 22 juin 2015-11:42:32

Dernière modification le : jeudi 20 avril 2023-03:19:58

Archivage à long terme le : mardi 15 septembre 2015-20:26:44

Dates et versions

hal-01166163 , version 1 (22-06-2015)

Identifiants

HAL Id : hal-01166163 , version 1

Citer

Julien Subercaze, Christophe Gravier, Frédérique Laforest. On metric embedding for boosting semantic similarity computations. Association of Computational Linguistics, Jul 2015, Beijing, China. ⟨hal-01166163⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS LAHC PARISTECH UDL

227 Consultations

573 Téléchargements

On metric embedding for boosting semantic similarity computations

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager