Skip to Main content Skip to Navigation
Conference papers

Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics

Michael Mercier 1, 2, 3 David Glesser 2 Yiannis Georgiou 2 Olivier Richard 1, 3 
3 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Executing Big Data workloads upon High Performance Computing (HPC) infrastractures has become an attractive way to improve their performances. However, the collocation of HPC and Big Data workloads is not an easy task, mainly because of their core concepts' differences. This paper focuses on the challenges related to the scheduling of both Big Data and HPC workloads on the same computing platform. In classic HPC workloads, the rigidity of jobs tends to create holes in the schedule: we can use those idle resources as a dynamic pool for Big Data workloads. We propose a new idea based on Resource and Job Management System's (RJMS) configuration, that makes HPC and Big Data systems to communicate through a simple prolog/epilog mechanism. It leverages the built-in resilience of Big Data frameworks, while minimizing the disturbance on HPC workloads. We present the first study of this approach, using the production RJMS middleware OAR and Hadoop YARN from the HPC and Big Data ecosystems respectively. Our new technique is evaluated with real experiments upon the Grid5000 platform. Our experiments validate our assumptions and show promising results. The system is capable of running an HPC workload with 70% cluster utilization, with a Big Data workload that fills the schedule holes to reach a full 100% utilization. We observe a penalty on the mean waiting time for HPC jobs of less than 17% and a Big Data effectiveness of more than 68% in average.
Document type :
Conference papers
Complete list of metadata

Cited literature [18 references]  Display  Hide  Download
Contributor : Michael Mercier Connect in order to contact the contributor
Submitted on : Monday, November 13, 2017 - 10:09:14 AM
Last modification on : Wednesday, July 6, 2022 - 4:23:00 AM
Long-term archiving on: : Wednesday, February 14, 2018 - 1:20:03 PM


Files produced by the author(s)


  • HAL Id : hal-01633507, version 1


Michael Mercier, David Glesser, Yiannis Georgiou, Olivier Richard. Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics. IEEE BigData 2017, Dec 2017, Boston, United States. ⟨hal-01633507⟩



Record views


Files downloads