Lightweight Metric Computation for Distributed Massive Data Streams

Emmanuelle Anceaume 1 Yann Busnel 2, 3
1 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
CentraleSupélec, Inria Rennes – Bretagne Atlantique , IRISA_D1 - SYSTÈMES LARGE ÉCHELLE
3 DIONYSOS - Dependability Interoperability and perfOrmance aNalYsiS Of networkS
Inria Rennes – Bretagne Atlantique , IRISA_D2 - RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES
Abstract : The real time analysis of massive data streams is of utmost importance in data intensive applications that need to detect as fast as possible and as efficiently as possible (in terms of computation and memory space) any correlation between its inputs or any deviance from some expected nominal behavior. The IoT infrastructure can be used for monitoring any events or changes in structural conditions that can compromise safety and increase risk. It is thus a recurrent and crucial issue to determine whether huge data streams, received at monitored devices , are correlated or not as it may reveal the presence of attacks. We propose a metric, called codeviation, that allows to evaluate the correlation between distributed massive streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We then generalize our approach by presenting a new metric, the Sketch-metric, which allows us to define a distance between updatable summaries of large data streams. An important feature of the Sketch-metric is that, given a measure on the entire initial data streams, the Sketch-metric preserves the axioms of the latter measure on the sketch. We finally present results obtained during extensive experiments conducted on both synthetic traces and real data sets allowing us to validate the robustness and accuracy of our metrics.
Complete list of metadatas

Cited literature [47 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01634353
Contributor : Emmanuelle Anceaume <>
Submitted on : Tuesday, November 14, 2017 - 8:42:10 AM
Last modification on : Tuesday, April 2, 2019 - 2:27:13 AM
Long-term archiving on : Thursday, February 15, 2018 - 12:22:49 PM

File

ab-tldks2017.pdf
Files produced by the author(s)

Identifiers

Citation

Emmanuelle Anceaume, Yann Busnel. Lightweight Metric Computation for Distributed Massive Data Streams. Transactions on Large-Scale Data- and Knowledge-Centered Systems, Springer Berlin / Heidelberg, 2017, 10430 (33), pp.1--39. ⟨10.1007/978-3-662-55696-2_1⟩. ⟨hal-01634353⟩

Share

Metrics

Record views

1292

Files downloads

158