CoMMEDIA: Separating Scaramouche from Harlequin to Accurately Estimate Items Frequency in Distributed Data Streams - Archive ouverte HAL Accéder directement au contenu
Rapport Année : 2013

CoMMEDIA: Separating Scaramouche from Harlequin to Accurately Estimate Items Frequency in Distributed Data Streams

Résumé

In this paper, we investigate the problem of estimating the number of times data items that recur in very large distributed data streams. We present an alternative approach to the well-known Count- Min Sketch in order to reduce the impact of collisions on the accuracy of the estimation. We propose to decrease, for each concerned item, the over-estimation that results from these collisions. Our sketch, called CoMMEDIETTA, keeps track of the most frequent items of the stream, and removes their weight from the one of the items with which these frequent items collide. By doing so, we significantly improve upon the Count-Min Sketch by achieving a randomized (ε,δ)-approximation al- gorithm. We then propose to judiciously distribute this local sketch to estimate the global frequency of any item that may recur in multiple streams. This distributed sketch, called CoMMEDIA (for Count-Min Sketch-based Estimation of Data Items Arrival frequency), organizes nodes of the system in a distributed hash table (DHT) such that each node implements a tiny local sketch on a reduced number of items. By doing so we guarantee a significantly more accurate estimation of item frequencies. Simulations both on synthetic and real traces confirm the accuracy of CoMMEDIA.
Fichier principal
Vignette du fichier
opodis.pdf (762.34 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00847764 , version 1 (24-07-2013)

Identifiants

  • HAL Id : hal-00847764 , version 1

Citer

Emmanuelle Anceaume, Yann Busnel. CoMMEDIA: Separating Scaramouche from Harlequin to Accurately Estimate Items Frequency in Distributed Data Streams. 2013. ⟨hal-00847764⟩
409 Consultations
141 Téléchargements

Partager

Gmail Facebook X LinkedIn More