Skip to Main content Skip to Navigation
Conference papers

Using Word Embedding for Cross-Language Plagiarism Detection

Abstract : This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download
Contributor : Laurent Besacier <>
Submitted on : Wednesday, April 5, 2017 - 10:21:40 AM
Last modification on : Wednesday, October 7, 2020 - 3:02:42 AM
Long-term archiving on: : Thursday, July 6, 2017 - 12:58:12 PM


Publisher files allowed on an open archive


  • HAL Id : hal-01502146, version 1
  • ARXIV : 1702.03082


Jérémy Ferrero, Frédéric Agnès, Laurent Besacier, Didier Schwab. Using Word Embedding for Cross-Language Plagiarism Detection. EACL 2017, Apr 2017, Valence, Spain. pp.415 - 421. ⟨hal-01502146⟩



Record views


Files downloads