Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Conference papers

A control-theory approach for cluster autonomic management: maximizing usage while avoiding overload

Agustín Gabriel yabo 1 Bogdan Robu 2 Olivier Richard 3 Bruno Bzeznik 4 Eric Rutten 1 
1 CTRL-A - Control for Autonomic computing systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
3 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Cloud and HPC (High-Performance Computing) systems have increasingly become more varying in their behavior, in particular in aspects such as performance and power consumption, and the fact that they are becoming less predictable demands more runtime management. In this work, we describe results addressing autonomic administration in HPC systems for scientific workflows management through a control theoretical approach. We propose a model described by parameters related to the key aspects of the infrastructure thus achieving a deterministic dynamical representation that covers the diverse and time-varying behaviors of the real computing system. Later, we propose a model-predictive control loop to achieve two different objectives: maximize cluster utilization by best-effort jobs and control the file server's load in the presence of external disturbances. The accuracy of the prediction relies on a parameter estimation scheme based on the EKF (Extended Kalman Filter) to adjust the predictive-model to the real system, making the approach adaptive to parametric variations in the infrastructure. The closed loop strategy shows performance improvement and consequently a reduction in the total computation time. The problem is addressed in a general way, to allow the implementation on similar HPC platforms, as well as scalability to different infrastructures.
Complete list of metadata

Cited literature [19 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02294272
Contributor : Éric Rutten Connect in order to contact the contributor
Submitted on : Monday, September 23, 2019 - 12:26:50 PM
Last modification on : Wednesday, July 6, 2022 - 4:14:22 AM

File

CCTA19_0092_FI.pdf
Files produced by the author(s)

Identifiers

Citation

Agustín Gabriel yabo, Bogdan Robu, Olivier Richard, Bruno Bzeznik, Eric Rutten. A control-theory approach for cluster autonomic management: maximizing usage while avoiding overload. CCTA 2019 - 3rd IEEE Conference on Control Technology and Applications, Aug 2019, Hong Kong, China. pp.189-195, ⟨10.1109/CCTA.2019.8920473⟩. ⟨hal-02294272⟩

Share

Metrics

Record views

182

Files downloads

164