MRA++: Scheduling and data placement on MapReduce for heterogeneous environments

Julio Anjos; Ivan Carrera Izurieta; Wagner Kolberg; Andre Luis Tibola; Luciana Arantes; Claudio Geyer

doi:10.1016/j.future.2014.09.001

Article Dans Une Revue Future Generation Computer Systems Année : 2015

MRA++: Scheduling and data placement on MapReduce for heterogeneous environments

(1) , (1) , (1) , (1) , (2) , (1)

1
2

Julio Anjos

Fonction : Auteur

Instituto de Informática, Programa de Pós-Graduação em Computação [Porto Alegre]

Ivan Carrera Izurieta

Fonction : Auteur

Instituto de Informática, Programa de Pós-Graduação em Computação [Porto Alegre]

Wagner Kolberg

Fonction : Auteur

Instituto de Informática, Programa de Pós-Graduação em Computação [Porto Alegre]

Andre Luis Tibola

Fonction : Auteur

Instituto de Informática, Programa de Pós-Graduação em Computação [Porto Alegre]

Luciana Arantes

Fonction : Auteur
PersonId : 2197
IdHAL : luciana-arantes
ORCID : 0000-0002-0938-2004
IdRef : 195040953

Large-Scale Distributed Systems and Applications

Claudio Geyer

Fonction : Auteur

Instituto de Informática, Programa de Pós-Graduação em Computação [Porto Alegre]

Résumé

MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. In this paper we make some improvements to the Hadoop MapReduce framework by introducing algorithms that are suitable for heterogeneous environments. The goal is to efficiently perform data-intensive computing in heterogeneous environments. The need for these adaptations derives from the fact that, following the framework design proposed by Google, Hadoop is optimized to run in large homogeneous clusters. Hence we propose MRA++, a new MapReduce framework design that considers the heterogeneity of nodes during data distribution, task scheduling and job control. MRA++establishes a training task to gather information prior to the data distribution. However, we show that the delay introduced in the setup phase is offset by the effectiveness of the mechanisms and algorithms, that achieve performance gains of more than 70% in 10 Mbps networks.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01197424

Soumis le : vendredi 11 septembre 2015-16:20:17

Dernière modification le : mardi 11 avril 2023-15:16:28

Dates et versions

hal-01197424 , version 1 (11-09-2015)

Identifiants

HAL Id : hal-01197424 , version 1
DOI : 10.1016/j.future.2014.09.001

Citer

Julio Anjos, Ivan Carrera Izurieta, Wagner Kolberg, Andre Luis Tibola, Luciana Arantes, et al.. MRA++: Scheduling and data placement on MapReduce for heterogeneous environments. Future Generation Computer Systems, 2015, 42, pp.22-35. ⟨10.1016/j.future.2014.09.001⟩. ⟨hal-01197424⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS INRIA LIP6 INRIA2 SORBONNE-UNIVERSITE SU-SCIENCES

253 Consultations

0 Téléchargements

MRA++: Scheduling and data placement on MapReduce for heterogeneous environments

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager