A survey on training and evaluation of word embeddings

François Torregrossa; Robin Allesiardo; Vincent Claveau; Nihel Kooli; Guillaume Gravier

doi:10.1007/s41060-021-00242-8

Article Dans Une Revue International Journal of Data Science and Analytics Année : 2021

A survey on training and evaluation of word embeddings

(1, 2) , (1) , (2) , (1) , (2)

1
2

François Torregrossa

Fonction : Auteur
PersonId : 1075840

Solocal

Creating and exploiting explicit links between multimedia fragments

Robin Allesiardo

Fonction : Auteur
PersonId : 4381
IdHAL : robin-allesiardo
IdRef : 197869483

Solocal

Vincent Claveau

Fonction : Auteur
PersonId : 5270
IdHAL : vincent-claveau
ORCID : 0000-0002-3459-0550
IdRef : 075988216

Creating and exploiting explicit links between multimedia fragments

Nihel Kooli

Fonction : Auteur

Solocal

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Creating and exploiting explicit links between multimedia fragments

Résumé

Word Embeddings have proven to be effective for many Natural Language Processing tasks by providing word representations integrating prior knowledge. In this article, we focus on the algorithms and models used to compute those representations and on their methods of evaluation. Many new techniques were developed in a short amount of time and there is no unified terminology to emphasise strengths and weaknesses of those methods. Based on the state of the art, we propose a thorough terminology to help with the classification of these various models and their evaluations. We also provide comparisons of those algorithms and methods, highlighting open problems and research paths, as well as a compilation of popular evaluation metrics and datasets. This survey gives: 1) an exhaustive description and terminology of currently investigated word embeddings, 2) a clear segmentation of evaluation methods and their associated datasets, and 3) high-level properties to indicate pros and cons of each solution.

Mots clés

Word Embeddings Word Embedding Evaluation Survey Contextualised Embeddings Non-Euclidean Embeddings

Domaines

Traitement du texte et du document Informatique [cs] Intelligence artificielle [cs.AI]

Fichier principal

ijdsa_format_final_postprint.pdf (584.78 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

François Torregrossa : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03148517

Soumis le : lundi 22 février 2021-12:11:02

Dernière modification le : vendredi 24 mars 2023-14:53:20

Archivage à long terme le : dimanche 23 mai 2021-18:33:42

Dates et versions

hal-03148517 , version 1 (22-02-2021)

Identifiants

HAL Id : hal-03148517 , version 1
DOI : 10.1007/s41060-021-00242-8

Citer

François Torregrossa, Robin Allesiardo, Vincent Claveau, Nihel Kooli, Guillaume Gravier. A survey on training and evaluation of word embeddings. International Journal of Data Science and Analytics, 2021, 11 (2), pp.85-103. ⟨10.1007/s41060-021-00242-8⟩. ⟨hal-03148517⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA CENTRALESUPELEC INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM CYBERSCHOOL

340 Consultations

747 Téléchargements

A survey on training and evaluation of word embeddings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager