Sequence Covering Similarity for Symbolic Sequence Comparison

Pierre-François Marteau 1
1 EXPRESSION - Expressiveness in Human Centered Data/Media
UBS - Université de Bretagne Sud, IRISA-D6 - MEDIA ET INTERACTIONS
Abstract : This paper introduces the sequence covering similarity, that we formally define for evaluating the similarity between a symbolic sequence (string) and a set of symbolic sequences (strings). From this covering similarity we derive a pair-wise distance to compare two symbolic sequences. We show that this covering distance is a semimetric. Few examples are given to show how this string metric in $O(n \cdot log n)$ compares with the Levenshtein's distance that is in $O(n^2)$. A final example presents its application to plagiarism detection.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01689286
Contributor : Pierre-François Marteau <>
Submitted on : Thursday, March 8, 2018 - 3:42:48 PM
Last modification on : Friday, April 19, 2019 - 4:55:10 PM
Long-term archiving on : Saturday, June 9, 2018 - 2:17:05 PM

Files

CoveringSimilarity-v2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01689286, version 3
  • ARXIV : 1801.07013

Citation

Pierre-François Marteau. Sequence Covering Similarity for Symbolic Sequence Comparison. 2018. ⟨hal-01689286v3⟩

Share

Metrics

Record views

225

Files downloads

68