Skip to Main content Skip to Navigation
Conference papers

Using Word Embedding for Cross-Language Plagiarism Detection

Abstract : This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.
Document type :
Conference papers
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01502146
Contributor : Laurent Besacier <>
Submitted on : Wednesday, April 5, 2017 - 10:21:40 AM
Last modification on : Wednesday, October 7, 2020 - 3:02:42 AM
Long-term archiving on: : Thursday, July 6, 2017 - 12:58:12 PM

File

EACLshort066.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01502146, version 1
  • ARXIV : 1702.03082

Citation

Jérémy Ferrero, Frédéric Agnès, Laurent Besacier, Didier Schwab. Using Word Embedding for Cross-Language Plagiarism Detection. EACL 2017, Apr 2017, Valence, Spain. pp.415 - 421. ⟨hal-01502146⟩

Share

Metrics

Record views

495

Files downloads

584