836 articles – 1515 references  [version française]
HAL: hal-00721211, version 1

Detailed view  Export this paper
Sketch *-metric: Comparing Data Streams via Sketching
Emmanuelle Anceaume 1, 2, Yann Busnel 3
For the LINA-GDD; IRISA-CIDER; IRISA-CIDRE collaboration(s)
(2012-07)

In this paper, we consider the problem of estimating the distance between any two large data streams in small- space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the Sketch ⋆-metric is that, given a measure on the entire initial data streams, the Sketch ⋆-metric preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the f-divergence). Extensive experiments conducted on both synthetic traces and real data allow us to validate the robustness and accuracy of the Sketch ⋆-metric.
1:  CIDER (IRISA)
Université de Rennes 1 – Institut National des Sciences Appliquées (INSA) - Rennes – CNRS : UMR6074
2:  CIDRE (INRIA - SUPELEC)
INRIA – SUPELEC
3:  Laboratoire d'Informatique de Nantes Atlantique (LINA)
CNRS : UMR6241 – Université de Nantes – École Nationale Supérieure des Mines - Nantes
GDD
Computer Science/Data Structures and Algorithms

Computer Science/Information Theory and Coding

Mathematics/Information Theory

Computer Science/Discrete Mathematics
Data stream – metric – randomized approxima- tion algorithm
Attached file list to this document: 
PDF
AB13-INFOCOM-RR.pdf(343.9 KB)
PS
AB13-INFOCOM-RR.ps(1.6 MB)