A Lightweight Continuous Jobs Mechanism for MapReduce Frameworks

Trong-Tuan Vu 1 Fabrice Huet 2
1 DOLPHIN - Parallel Cooperative Multi-criteria Optimization
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
2 OASIS - Active objects, semantics, Internet and security
CRISAM - Inria Sophia Antipolis - Méditerranée , COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : MapReduce is a programming model which allows the processing of vast amounts of data in parallel, on a large number of machines. It is particularly well suited to static or slow changing set of data since the execution time of a job is usually high. However, in practice data-centers collect data at fast rates which makes it very difficult to maintain up-to-date results. To address this challenge, we propose in this paper a generic mechanism for dealing with dynamic data in MapReduce frameworks. Long-standing MapReduce jobs, called continuous Jobs, are automatically re-executed to process new incoming data at a minimum cost. We present a simple and clean API which integrates nicely with the standard MapReduce model. Furthermore, we describe cHadoop, an implementation of our approach based on Hadoop which does not require modifications to the source code of the original framework. Thus, cHadoop can quickly be ported to any new version of Hadoop. We evaluate our proposal with two standard MapReduce applications (WordCount and WordCount-N-Count), and one real world application (RDF Query) on real datasets. Our evaluations on clusters ranging from 5 to 40 nodes demonstrate the benefit of our approach in terms of execution time and ease of use.
Type de document :
Communication dans un congrès
13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Jun 2013, Netherlands. pp.269-276, 2013
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00916103
Contributeur : Fabrice Huet <>
Soumis le : jeudi 12 décembre 2013 - 15:56:38
Dernière modification le : samedi 16 janvier 2016 - 01:10:23
Document(s) archivé(s) le : vendredi 14 mars 2014 - 09:41:03

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00916103, version 1

Citation

Trong-Tuan Vu, Fabrice Huet. A Lightweight Continuous Jobs Mechanism for MapReduce Frameworks. 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Jun 2013, Netherlands. pp.269-276, 2013. <hal-00916103>

Partager

Métriques

Consultations de
la notice

317

Téléchargements du document

297