A Scalable and Skew-insensitive Algorithm for Join Operations using Map/Reduce Model

Abstract : For over a decade, Map/Reduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges\,: data skew, task imbalance, high disk i/o and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new Frequency Adaptive algorithm based on Map/Reduce Programming model and distributed histograms for join processing on large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. Performances have been experimented on Grid'5000 infrastructure.
Type de document :
Rapport
2014
Liste complète des métadonnées

Littérature citée [18 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00947730
Contributeur : Mostafa Bamha <>
Soumis le : mardi 4 mars 2014 - 11:03:30
Dernière modification le : jeudi 7 février 2019 - 14:35:20
Document(s) archivé(s) le : mercredi 4 juin 2014 - 11:01:33

Fichier

RR-2014-01.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00947730, version 2

Collections

Citation

Mostafa Bamha, Frédéric Loulergue. A Scalable and Skew-insensitive Algorithm for Join Operations using Map/Reduce Model. 2014. 〈hal-00947730v2〉

Partager

Métriques

Consultations de la notice

213

Téléchargements de fichiers

158