Skip to Main content Skip to Navigation
Journal articles

A survey on training and evaluation of word embeddings

Abstract : Word Embeddings have proven to be effective for many Natural Language Processing tasks by providing word representations integrating prior knowledge. In this article, we focus on the algorithms and models used to compute those representations and on their methods of evaluation. Many new techniques were developed in a short amount of time and there is no unified terminology to emphasise strengths and weaknesses of those methods. Based on the state of the art, we propose a thorough terminology to help with the classification of these various models and their evaluations. We also provide comparisons of those algorithms and methods, highlighting open problems and research paths, as well as a compilation of popular evaluation metrics and datasets. This survey gives: 1) an exhaustive description and terminology of currently investigated word embeddings, 2) a clear segmentation of evaluation methods and their associated datasets, and 3) high-level properties to indicate pros and cons of each solution.
Complete list of metadata
Contributor : François Torregrossa Connect in order to contact the contributor
Submitted on : Monday, February 22, 2021 - 12:11:02 PM
Last modification on : Tuesday, October 19, 2021 - 11:04:41 AM
Long-term archiving on: : Sunday, May 23, 2021 - 6:33:42 PM


Files produced by the author(s)



François Torregrossa, Robin Allesiardo, Vincent Claveau, Nihel Kooli, Guillaume Gravier. A survey on training and evaluation of word embeddings. International Journal of Data Science and Analytics, Springer Verlag, 2021, 11 (2), pp.85-103. ⟨10.1007/s41060-021-00242-8⟩. ⟨hal-03148517⟩



Les métriques sont temporairement indisponibles