Skip to Main content Skip to Navigation
New interface
Conference papers

DEISA: dask-enabled in situ analytics

Amal Gueroudji 1, 2 Julien Bigot 1 Bruno Raffin 2 
2 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : A widening performance gap is separating CPU performance and IO bandwidth on large scale systems. In some fields such as weather forecast and nuclear fusion, numerical models generate such amounts of data that classical post hoc processing is not feasible anymore due to the limits in both storage capacity and IO performance. In situ approaches are attractive to bypass disk accesses in these cases and fully leverage the HPC platform. They are however often complex to set up and can require to re-develop parallel versions of the analysis from scratch. In this paper we propose a hybrid model that is well suited for in situ workflows that combine regular simulations and irregular analytics. Our model couples the bulk synchronous parallel paradigm for simulation with a distributed task-based one for analysis. This reduces complexity and leverages the best of each of these two powerful paradigms. We validate the model with a prototype, called DEISA, that supports coupling MPI parallel codes with analyses written using Dask. This implementation requires minimal modifications of both the simulation and analysis codes compared to their post hoc counterpart. It give access to an already existing rich ecosystem to be used in situ such as the parallel versions of Numpy, Pandas and scikit-learn. Experiments in configurations up to 1024 cores show that DEISA can improve the simulation wallclock time (excluding analysis) by a factor up to 3 and the total experiment (including analysis) hour.core cost by a factor of up to 5 compared to parallel post hoc with plain Dask while requiring the modification of only two lines of python code, three of YAML, and none at all in a C simulation code already instrumented with PDI Data Interface
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03509198
Contributor : Amal Gueroudji Connect in order to contact the contributor
Submitted on : Tuesday, January 4, 2022 - 9:38:17 AM
Last modification on : Tuesday, October 25, 2022 - 4:22:13 PM

File

HiPC__DEISA__Dask_Enabled_In_S...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03509198, version 1

Citation

Amal Gueroudji, Julien Bigot, Bruno Raffin. DEISA: dask-enabled in situ analytics. HiPC 2021 - 28th International Conference on High Performance Computing, Data, and Analytics, Dec 2021, virtual, India. pp.1-10. ⟨hal-03509198⟩

Share

Metrics

Record views

153

Files downloads

153