On the Performance of Spark on HPC Systems: Towards a Complete Picture

Orcun Yildiz 1 Shadi Ibrahim 2
2 STACK - Software Stack for Massively Geo-Distributed Infrastructures
Inria Rennes – Bretagne Atlantique , LS2N - Laboratoire des Sciences du Numérique de Nantes
Abstract : Big Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been increasingly used by many companies and research labs to fa- cilitate large-scale data analysis. However, with the growing needs of users and size of data, commodity-based infrastructure will strain under the heavy weight of Big Data. On the other hand, HPC systems offer a rich set of opportunities for Big Data processing. As first steps toward Big Data processing on HPC systems, several research efforts have been devoted to understanding the performance of Big Data applications on these systems. Yet the HPC specific performance considera- tions have not been fully investigated. In this work, we conduct an experimental campaign to provide a clearer understanding of the performance of Spark, the de facto in-memory data processing framework, on HPC systems. We ran Spark using representative Big Data workloads on Grid’5000 testbed to evaluate how the latency, contention and file system’s configuration can influence the applica- tion performance. We discuss the implications of our findings and draw attention to new ways (e.g., burst buffers) to improve the performance of Spark on HPC systems.
Type de document :
Communication dans un congrès
SupercomputingAsia 2018 (SCA18), Mar 2018, Singapore, Singapore
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01742016
Contributeur : Shadi Ibrahim <>
Soumis le : vendredi 23 mars 2018 - 16:11:09
Dernière modification le : jeudi 19 avril 2018 - 11:46:06

Identifiants

  • HAL Id : hal-01742016, version 1

Citation

Orcun Yildiz, Shadi Ibrahim. On the Performance of Spark on HPC Systems: Towards a Complete Picture . SupercomputingAsia 2018 (SCA18), Mar 2018, Singapore, Singapore. 〈hal-01742016〉

Partager

Métriques

Consultations de la notice

206