On the Performance of Spark on HPC Systems: Towards a Complete Picture

Orcun Yildiz; Shadi Ibrahim

doi:10.1007/978-3-319-69953-0_5

Communication Dans Un Congrès Année : 2018

On the Performance of Spark on HPC Systems: Towards a Complete Picture

(1) , (2, 3)

1
2
3

Orcun Yildiz

Fonction : Auteur
PersonId : 984193

Mathematics and Computer Science Division [ANL]

Shadi Ibrahim

Fonction : Auteur
PersonId : 13360
IdHAL : shadi-ibrahim

Software Stack for Massively Geo-Distributed Infrastructures

Département Automatique, Productique et Informatique

Résumé

Big Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been increasingly used by many companies and research labs to fa- cilitate large-scale data analysis. However, with the growing needs of users and size of data, commodity-based infrastructure will strain under the heavy weight of Big Data. On the other hand, HPC systems offer a rich set of opportunities for Big Data processing. As first steps toward Big Data processing on HPC systems, several research efforts have been devoted to understanding the performance of Big Data applications on these systems. Yet the HPC specific performance considera- tions have not been fully investigated. In this work, we conduct an experimental campaign to provide a clearer understanding of the performance of Spark, the de facto in-memory data processing framework, on HPC systems. We ran Spark using representative Big Data workloads on Grid’5000 testbed to evaluate how the latency, contention and file system’s configuration can influence the applica- tion performance. We discuss the implications of our findings and draw attention to new ways (e.g., burst buffers) to improve the performance of Spark on HPC systems.

Domaines

Informatique Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

Yildiz-Ibrahim2018_Chapter_OnThePerformanceOfSparkOnHPCSy.pdf (737.34 Ko)

Origine : Publication financée par une institution

Alain Monteil : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01742016

Soumis le : mercredi 21 avril 2021-16:15:38

Dernière modification le : vendredi 24 mars 2023-14:53:21

Archivage à long terme le : jeudi 22 juillet 2021-18:57:20

Dates et versions

hal-01742016 , version 1 (21-04-2021)

Identifiants

HAL Id : hal-01742016 , version 1
DOI : 10.1007/978-3-319-69953-0_5

Citer

Orcun Yildiz, Shadi Ibrahim. On the Performance of Spark on HPC Systems: Towards a Complete Picture. SCA 2018 - SupercomputingAsia, Mar 2018, Singapore, Singapore. pp.70-89, ⟨10.1007/978-3-319-69953-0_5⟩. ⟨hal-01742016⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES INSTITUT-TELECOM CNRS INRIA EC-NANTES GRID5000 UNAM INRIA2 LS2N LS2N-STACK IMT-ATLANTIQUE SILECS ANR NANTES-UNIVERSITE

432 Consultations

52 Téléchargements

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager