Viewing functions as token sequences to highlight similarities in source code

Abstract : The detection of similarities in source code has applications not only in software re-engineering (to eliminate redundancies) but also in software plagiarism detection. This latter can be a challenging problem since more or less extensive edits may have been performed on the original copy: insertion or removal of useless chunks of code, rewriting of expressions, transposition of code, inlining and outlining of functions, etc. In this paper, we propose a new similarity detection technique not only based on token sequence matching but also on the factorization of the function call graphs. The factorization process merges shared chunks (factors) of codes to cope, in particular, with inlining and outlining. The resulting call graph offers a view of the similarities with their nesting relations. It is useful to infer metrics quantifying similarity at a function level.
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00780290
Contributeur : Michel Chilowicz <>
Soumis le : mercredi 23 janvier 2013 - 16:31:10
Dernière modification le : jeudi 11 janvier 2018 - 06:20:22
Document(s) archivé(s) le : samedi 1 avril 2017 - 08:54:52

Fichier

article.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Michel Chilowicz, Étienne Duris, Gilles Roussel. Viewing functions as token sequences to highlight similarities in source code. Science of Computer Programming, Elsevier, 2013, 78 (10), pp.1871-1891. 〈10.1016/j.scico.2012.11.008〉. 〈hal-00780290〉

Partager

Métriques

Consultations de la notice

242

Téléchargements de fichiers

145