MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Abstract : We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [12 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01335930
Contributor : Christophe Servan <>
Submitted on : Wednesday, June 22, 2016 - 2:56:30 PM
Last modification on : Thursday, April 4, 2019 - 10:18:05 AM

File

Berard_and_al-MultiVec_a_Multi...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01335930, version 1

Citation

Alexandre Bérard, Christophe Servan, Olivier Pietquin, Laurent Besacier. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP. The 10th edition of the Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia. ⟨hal-01335930⟩

Share

Metrics

Record views

816

Files downloads

412