Using Word Embedding for Cross-Language Plagiarism Detection

Abstract : This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01502146
Contributor : Laurent Besacier <>
Submitted on : Wednesday, April 5, 2017 - 10:21:40 AM
Last modification on : Thursday, April 4, 2019 - 10:18:05 AM
Document(s) archivé(s) le : Thursday, July 6, 2017 - 12:58:12 PM

File

EACLshort066.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01502146, version 1
  • ARXIV : 1702.03082

Collections

Citation

Jérémy Ferrero, Frédéric Agnès, Laurent Besacier, Didier Schwab. Using Word Embedding for Cross-Language Plagiarism Detection. EACL 2017, Apr 2017, Valence, Spain. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2,, 2, pp.415 - 421. 〈hal-01502146〉

Share

Metrics

Record views

424

Files downloads

297