Sliding HyperLogLog: Estimating cardinality in a data stream

Abstract : In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al to the data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/\sqrt{m} (the same as in HyperLogLog algorithm). As the new algorithm answers more flexible queries, it needs an additional memory storage compared to HyerLogLog algorithm. It is proved that this additional memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with an additional memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00465313
Contributor : Yousra Chabchoub <>
Submitted on : Friday, March 26, 2010 - 10:55:41 AM
Last modification on : Friday, May 25, 2018 - 12:02:03 PM
Long-term archiving on : Tuesday, September 28, 2010 - 11:46:31 AM

File

sliding_HyperLogLog.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00465313, version 1

Collections

Citation

Yousra Chabchoub, Georges Hébrail. Sliding HyperLogLog: Estimating cardinality in a data stream. 2010. ⟨hal-00465313⟩

Share

Metrics

Record views

348

Files downloads

6261