Using Word Embedding for Cross-Language Plagiarism Detection

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.

Domaines

Informatique et langage [cs.CL]

Fichier principal

EACLshort066.pdf (156.35 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Laurent Besacier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01502146

Soumis le : mercredi 5 avril 2017-10:21:40

Dernière modification le : lundi 15 avril 2024-11:25:23

Archivage à long terme le : jeudi 6 juillet 2017-12:58:12

Dates et versions

hal-01502146 , version 1 (05-04-2017)

Identifiants

HAL Id : hal-01502146 , version 1
ARXIV : 1702.03082

Citer

Jérémy Ferrero, Frédéric Agnès, Laurent Besacier, Didier Schwab. Using Word Embedding for Cross-Language Plagiarism Detection. EACL 2017, Apr 2017, Valence, Spain. pp.415 - 421. ⟨hal-01502146⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG LIG_TDCGE_GETALP POLYTECH-GRENOBLE LIG_SIDCH

366 Consultations

517 Téléchargements