MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Abstract : We present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification.
Type de document :
Communication dans un congrès
The 10th edition of the Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia. 2016, 〈http://lrec2016.lrec-conf.org/en/〉
Liste complète des métadonnées

Littérature citée [12 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01335930
Contributeur : Christophe Servan <>
Soumis le : mercredi 22 juin 2016 - 14:56:30
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03

Fichier

Berard_and_al-MultiVec_a_Multi...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01335930, version 1

Citation

Alexandre Bérard, Christophe Servan, Olivier Pietquin, Laurent Besacier. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP. The 10th edition of the Language Resources and Evaluation Conference (LREC), May 2016, Portoroz, Slovenia. 2016, 〈http://lrec2016.lrec-conf.org/en/〉. 〈hal-01335930〉

Partager

Métriques

Consultations de la notice

744

Téléchargements de fichiers

346