Efficient Key Grouping for Near-Optimal Load Balancing in Stream Processing Systems

Nicoló Rivetti 1 Leonardo Querzoni 2 Emmanuelle Anceaume 3 Yann Busnel 4, 5 Bruno Sericola 5
1 GDD - Gestion de Données Distribuées [Nantes]
LINA - Laboratoire d'Informatique de Nantes Atlantique
3 CIDRE - Confidentialité, Intégrité, Disponibilité et Répartition
CentraleSupélec, Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
5 DIONYSOS - Dependability Interoperability and perfOrmance aNalYsiS Of networkS
Inria Rennes – Bretagne Atlantique , IRISA-D2 - RÉSEAUX, TÉLÉCOMMUNICATION ET SERVICES
Abstract : Key grouping is a technique used by stream processing frameworks to simplify the development of parallel stateful operators. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values contained in the tuples themselves. Each operator instance target of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to implement key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distribution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the mapping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.
Complete list of metadatas

Cited literature [15 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01194518
Contributor : Emmanuelle Anceaume <>
Submitted on : Wednesday, September 9, 2015 - 12:08:05 PM
Last modification on : Friday, November 16, 2018 - 1:39:09 AM
Long-term archiving on : Monday, December 28, 2015 - 10:49:28 PM

File

main.pdf
Files produced by the author(s)

Identifiers

Citation

Nicoló Rivetti, Leonardo Querzoni, Emmanuelle Anceaume, Yann Busnel, Bruno Sericola. Efficient Key Grouping for Near-Optimal Load Balancing in Stream Processing Systems . The 9th ACM International Conference on Distributed Event-Based Systems (DEBS), Jun 2015, Oslo, Norway. ⟨10.1145/2675743.2771827⟩. ⟨hal-01194518⟩

Share

Metrics

Record views

2102

Files downloads

400