MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Résumé

We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.
Fichier principal
Vignette du fichier
Berard_and_al-MultiVec_a_Multilingual_and_Multilevel_Representation-LREC2016.pdf (316.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01335930 , version 1 (22-06-2016)

Identifiants

  • HAL Id : hal-01335930 , version 1

Citer

Alexandre Bérard, Christophe Servan, Olivier Pietquin, Laurent Besacier. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP. The 10th edition of the Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia. ⟨hal-01335930⟩
916 Consultations
528 Téléchargements

Partager

Gmail Facebook X LinkedIn More