Deviation Estimation between Distributed Data Streams

Emmanuelle Anceaume 1, 2 Yann Busnel 3
1 CIDER
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
2 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
CentraleSupélec, Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
3 GDD - Gestion de Données Distribuées [Nantes]
LINA - Laboratoire d'Informatique de Nantes Atlantique
Abstract : The analysis of massive data streams is fundamental in many monitoring applications. In particular, for networks operators, it is a recurrent and crucial issue to determine whether huge data streams, received at their monitored devices, are correlated or not as it may reveal the presence of malicious activities in the network system. We propose a metric, called codeviation, that allows to evaluate the correlation between distributed streams. This metric is inspired from classical metric in statistics and probability theory, and as such allows us to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We give upper and lower bounds on the quality of the codeviation, and provide both local and distributed algorithms that additively approximates the codeviation among n data streams by using O(1/ε ln (1/δ) (log N + (n+1) log m)) bits of space, where N is the domain value from which data items are drawn, and m is the maximal stream's length. To the best of our knowledge, such a metric has never been proposed so far
Complete list of metadatas

Cited literature [24 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00998702
Contributor : Yann Busnel <>
Submitted on : Monday, June 2, 2014 - 3:17:37 PM
Last modification on : Friday, November 16, 2018 - 1:39:10 AM
Long-term archiving on : Tuesday, September 2, 2014 - 12:25:51 PM

File

PID3096993.pdf
Files produced by the author(s)

Identifiers

Citation

Emmanuelle Anceaume, Yann Busnel. Deviation Estimation between Distributed Data Streams. 10th European Dependable Computing Conference (EDCC 2014), May 2014, Newcastle, United Kingdom. pp.35-45, ⟨10.1109/EDCC.2014.27⟩. ⟨hal-00998702⟩

Share

Metrics

Record views

1905

Files downloads

261