Online Tuning of EASY-Backfilling using Queue Reordering Policies

Éric Gaussier 1 Jerome Lelong 2 Valentin Reis 1 Denis Trystram 3, 4
2 DAO - Données, Apprentissage et Optimisation
LJK - Laboratoire Jean Kuntzmann
4 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : The EASY-FCFS heuristic is the basic building block of job scheduling policies in most parallel High Performance Computing platforms. Despite its simplicity, and the guarantee of no job starvation, it could still be improved on a per-system basis. Such tuning is difficult because of non-linearities in the scheduling process. The study conducted in this paper considers an online approach to the automatic tuning of the EASY heuristic for HPC platforms. More precisely, we consider the problem of selecting a reordering policy for the job queue under several feedback modes. We show via a comprehensive experimental validation on actual logs that periodic simulation of historical data can be used to recover existing in-hindsight results that allow to divide the average waiting time by almost 2. This results holds even when the simulator results are noisy. Moreover, we show that good performances can still be obtained without a simulator, under what is called bandit feedback - when we can only observe the performance of the algorithm that was picked on the live system. Indeed, a simple multi-armed bandit algorithm can reduce the average waiting time by 40 percent.
Type de document :
Article dans une revue
IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers, 2018, 29 (10), pp.2304-2316. 〈10.1109/TPDS.2018.2820699〉
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01963216
Contributeur : Denis Trystram <>
Soumis le : vendredi 21 décembre 2018 - 11:17:47
Dernière modification le : vendredi 28 décembre 2018 - 18:49:34

Identifiants

Citation

Éric Gaussier, Jerome Lelong, Valentin Reis, Denis Trystram. Online Tuning of EASY-Backfilling using Queue Reordering Policies. IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers, 2018, 29 (10), pp.2304-2316. 〈10.1109/TPDS.2018.2820699〉. 〈hal-01963216〉

Partager

Métriques

Consultations de la notice

44