Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?

Arnaud Legrand 1 Denis Trystram 2 Salah Zrigui 2
1 POLARIS - Performance analysis and optimization of LARge Infrastructures and Systems
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
2 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Despite the impressive growth and size of super-computers, the computational power they provide still cannot match the demand. Efficient and fair resource allocation is a critical task. Super-computers use Resource and Job Management Systems to schedule applications, which is generally done by relying on generic index policies such as First Come First Served and Shortest Processing time First in combination with Backfilling strategies. Unfortunately, such generic policies often fail to exploit specific characteristics of real workloads. In this work, we focus on improving the performance of online schedulers. We study mixed policies, which are created by combining multiple job characteristics in a weighted linear expression, as opposed to classical pure policies which use only a single characteristic. This larger class of scheduling policies aims at providing more flexibility and adaptability. We use space coverage and black-box optimization techniques to explore this new space of mixed policies and we study how can they adapt to the changes in the workload. We perform an extensive experimental campaign through which we show that (1) even the best pure policy is far from optimal and that (2) using a carefully tuned mixed policy would allow to significantly improve the performance of the system. (3) We also provide empirical evidence that there is no one size fits all policy, by showing that the rapid workload evolution seems to prevent classical online learning algorithms from being effective.
Type de document :
Communication dans un congrès
IPDPS 2019 - 33rd IEEE International Parallel & Distributed Processing Symposium, May 2019, rio de janeiro, Brazil. IEEE, pp.1-10
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-02044903
Contributeur : Salah Zrigui <>
Soumis le : jeudi 21 février 2019 - 16:50:14
Dernière modification le : mercredi 13 mars 2019 - 15:02:04

Fichier

final_version.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-02044903, version 1

Citation

Arnaud Legrand, Denis Trystram, Salah Zrigui. Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?. IPDPS 2019 - 33rd IEEE International Parallel & Distributed Processing Symposium, May 2019, rio de janeiro, Brazil. IEEE, pp.1-10. 〈hal-02044903〉

Partager

Métriques

Consultations de la notice

106

Téléchargements de fichiers

29