Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics

Abstract : Executing Big Data workloads upon High Performance Computing (HPC) infrastractures has become an attractive way to improve their performances. However, the collocation of HPC and Big Data workloads is not an easy task, mainly because of their core concepts' differences. This paper focuses on the challenges related to the scheduling of both Big Data and HPC workloads on the same computing platform. In classic HPC workloads, the rigidity of jobs tends to create holes in the schedule: we can use those idle resources as a dynamic pool for Big Data workloads. We propose a new idea based on Resource and Job Management System's (RJMS) configuration, that makes HPC and Big Data systems to communicate through a simple prolog/epilog mechanism. It leverages the built-in resilience of Big Data frameworks, while minimizing the disturbance on HPC workloads. We present the first study of this approach, using the production RJMS middleware OAR and Hadoop YARN from the HPC and Big Data ecosystems respectively. Our new technique is evaluated with real experiments upon the Grid5000 platform. Our experiments validate our assumptions and show promising results. The system is capable of running an HPC workload with 70% cluster utilization, with a Big Data workload that fills the schedule holes to reach a full 100% utilization. We observe a penalty on the mean waiting time for HPC jobs of less than 17% and a Big Data effectiveness of more than 68% in average.
Document type :
Conference papers
Complete list of metadatas

Cited literature [18 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01633507
Contributor : Michael Mercier <>
Submitted on : Monday, November 13, 2017 - 10:09:14 AM
Last modification on : Monday, July 8, 2019 - 3:10:46 PM
Long-term archiving on : Wednesday, February 14, 2018 - 1:20:03 PM

File

bigdata_hpc_colocation.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01633507, version 1

Citation

Michael Mercier, David Glesser, Yiannis Georgiou, Olivier Richard. Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics. IEEE BigData 2017, Dec 2017, Boston, United States. ⟨hal-01633507⟩

Share

Metrics

Record views

2191

Files downloads

708