Using Word Embedding for Cross-Language Plagiarism Detection

Abstract : This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.
Type de document :
Communication dans un congrès
EACL 2017, Apr 2017, Valence, Spain. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2,, 2, pp.415 - 421
Liste complète des métadonnées

Littérature citée [15 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01502146
Contributeur : Laurent Besacier <>
Soumis le : mercredi 5 avril 2017 - 10:21:40
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : jeudi 6 juillet 2017 - 12:58:12

Fichier

EACLshort066.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01502146, version 1
  • ARXIV : 1702.03082

Collections

Citation

Jérémy Ferrero, Frédéric Agnès, Laurent Besacier, Didier Schwab. Using Word Embedding for Cross-Language Plagiarism Detection. EACL 2017, Apr 2017, Valence, Spain. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2,, 2, pp.415 - 421. 〈hal-01502146〉

Partager

Métriques

Consultations de la notice

416

Téléchargements de fichiers

250