Information Theoretical and Statistical Features for Intrinsic Plagiarism Detection - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Information Theoretical and Statistical Features for Intrinsic Plagiarism Detection

Résumé

In this paper we present some information theoretical and statistical features including function word skip n-grams for detecting plagiarism intrinsically. We train a binary classifier with different feature sets and observe their performances. Basically, we propose a set of 36 features for classifying plagiarized and non-plagiarized texts in suspicious documents. Our experiment finds that entropy, relative entropy and correlation coefficient of function word skip n-gram frequency profiles are very effective features. The proposed feature set achieves F-Score of 85.10%.
Fichier non déposé

Dates et versions

Identifiants

  • HAL Id : hal-01617333 , version 1

Citer

Rashedur Rahman. Information Theoretical and Statistical Features for Intrinsic Plagiarism Detection. 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Sep 2015, Prague, Czech Republic. ⟨hal-01617333⟩
106 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More