An Information Divergence Estimation over Data Streams

Emmanuelle Anceaume 1, 2 Yann Busnel 3
1 CIDER
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
2 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
CentraleSupélec, Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
3 GDD - Gestion de Données Distribuées [Nantes]
LINA - Laboratoire d'Informatique de Nantes Atlantique
Abstract : In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the KL divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (ε,δ)-approximation algorithm with a space complexity Õ(1/ε + 1/ε^2) bits in "most" cases, and Õ(1/ε + (n−ε−1)/ε^2) otherwise. To the best of our knowledge, an approximation algorithm for estimating the KL divergence has never been analyzed before.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00725097
Contributor : Yann Busnel <>
Submitted on : Thursday, August 23, 2012 - 10:49:08 PM
Last modification on : Friday, November 16, 2018 - 1:39:08 AM
Long-term archiving on : Saturday, November 24, 2012 - 2:45:15 AM

File

papier.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00725097, version 1

Citation

Emmanuelle Anceaume, Yann Busnel. An Information Divergence Estimation over Data Streams. 11th IEEE International Symposium on Network Computing and Applications (IEEE NCA12), Aug 2012, Cambridge, MA, United States. pp.Number 72. ⟨hal-00725097⟩

Share

Metrics

Record views

1882

Files downloads

316