Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study

Eugen Feller 1 Lavanya Ramakrishnan 2 Christine Morin 3
1 MYRIADS - Design and Implementation of Autonomous Distributed Systems
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
2 Advanced Computing for Science
ACS - Advanced Computing for Science Department
3 PARIS - Programming distributed parallel systems for large scale numerical simulation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, ENS Cachan - École normale supérieure - Cachan, Inria Rennes – Bretagne Atlantique
Abstract : The exponential growth of scientific and business data has resulted in the evolution of the cloud computing environments and the MapReduce parallel programming model. The focus of cloud computing is increased utilization and power savings through consolidation while MapReduce enables large scale data analysis. Hadoop, an open source implementation of MapReduce has gained popularity in the last few years. In this paper, we evaluate Hadoop performance in both the traditional model of collocated data and compute services as well as consider the impact of separating out the services. The separation of data and compute services provides more flexibility in environments where data locality might not have a considerable impact such as virtualized environments and clusters with advanced networks. In this paper, we also conduct an energy efficiency evaluation of Hadoop on physical and virtual clusters in different configurations. Our extensive evaluation shows that: (1) coexisting virtual machines on servers decrease the disk throughput; (2) performance on physical clusters is significantly better than on virtual clusters; (3) performance degradation due to separation of the services depends on the data to compute ratio; (4) application completion progress correlates with the power consumption and power consumption is heavily application specific. Finally, we present a discussion on the implications of using cloud environments for big data analyses.
Type de document :
Article dans une revue
Journal of Parallel and Distributed Computing, Elsevier, 2015, 79-80, pp.80-89. 〈10.1016/j.jpdc.2015.01.001〉
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01271141
Contributeur : Christine Morin <>
Soumis le : jeudi 11 février 2016 - 09:24:37
Dernière modification le : jeudi 7 février 2019 - 15:03:50
Document(s) archivé(s) le : samedi 12 novembre 2016 - 14:16:43

Fichier

main (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Eugen Feller, Lavanya Ramakrishnan, Christine Morin. Performance and energy efficiency of big data applications in cloud environments: A Hadoop case study. Journal of Parallel and Distributed Computing, Elsevier, 2015, 79-80, pp.80-89. 〈10.1016/j.jpdc.2015.01.001〉. 〈hal-01271141〉

Partager

Métriques

Consultations de la notice

1026

Téléchargements de fichiers

831