Skip to Main content Skip to Navigation
Conference papers

On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model

Abstract : This paper presents a new algorithm that detects on the fly the k most frequent items in the sliding window model. This algorithm is distributed among the nodes of the system. It is inspired by a recent and innovative approach, which consists in associating a stochastic value correlated with the item's frequency instead of trying to estimate its number of occurrences. This stochastic value corresponds to the number of consecutive heads in coin flipping until the first tail occurs. The original approach was to retain just the maximum of consecutive heads obtained by an item, since an item that often occurs will have a higher probability of having a high value. While effective for very skewed data distributions, the correlation is not tight enough to robustly distinguish items with comparable frequencies. To address this important issue, we propose to combine the stochastic approach together with a deterministic counting of items. Specifically, in place of keeping the maximum number of consecutive heads obtained by an item, we count the number of times the coin flipping process of an item has exceeded a given threshold. This threshold is defined by combining theoretical results in leader election and coupon collector problems. Results on simulated data show how impressive is the detection of the top-k items in a large range of distributions.
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Emmanuelle Anceaume Connect in order to contact the contributor
Submitted on : Monday, October 8, 2018 - 9:49:04 AM
Last modification on : Wednesday, November 3, 2021 - 8:14:33 AM
Long-term archiving on: : Wednesday, January 9, 2019 - 1:06:08 PM


Files produced by the author(s)



Emmanuelle Anceaume, Yann Busnel, Vasile Cazacu. On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model. NCA 2018 - 17th IEEE International Symposium on Network Computing and Applications, IEEE, Nov 2018, Boston, United States. pp.1-8, ⟨10.1109/NCA.2018.8548097⟩. ⟨hal-01888298⟩



Les métriques sont temporairement indisponibles