Skip to Main content Skip to Navigation
Conference papers

Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?

Arnaud Legrand 1 Denis Trystram 2 Salah Zrigui 2
1 POLARIS - Performance analysis and optimization of LARge Infrastructures and Systems
LIG - Laboratoire d'Informatique de Grenoble, Inria Grenoble - Rhône-Alpes
2 DATAMOVE - Data Aware Large Scale Computing
Inria Grenoble - Rhône-Alpes, LIG - Laboratoire d'Informatique de Grenoble
Abstract : Despite the impressive growth and size of super-computers, the computational power they provide still cannot match the demand. Efficient and fair resource allocation is a critical task. Super-computers use Resource and Job Management Systems to schedule applications, which is generally done by relying on generic index policies such as First Come First Served and Shortest Processing time First in combination with Backfilling strategies. Unfortunately, such generic policies often fail to exploit specific characteristics of real workloads. In this work, we focus on improving the performance of online schedulers. We study mixed policies, which are created by combining multiple job characteristics in a weighted linear expression, as opposed to classical pure policies which use only a single characteristic. This larger class of scheduling policies aims at providing more flexibility and adaptability. We use space coverage and black-box optimization techniques to explore this new space of mixed policies and we study how can they adapt to the changes in the workload. We perform an extensive experimental campaign through which we show that (1) even the best pure policy is far from optimal and that (2) using a carefully tuned mixed policy would allow to significantly improve the performance of the system. (3) We also provide empirical evidence that there is no one size fits all policy, by showing that the rapid workload evolution seems to prevent classical online learning algorithms from being effective.
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02044903
Contributor : Salah Zrigui <>
Submitted on : Thursday, February 21, 2019 - 4:50:14 PM
Last modification on : Friday, May 15, 2020 - 11:24:27 AM
Document(s) archivé(s) le : Wednesday, May 22, 2019 - 8:10:32 PM

File

final_version.pdf
Files produced by the author(s)

Identifiers

Citation

Arnaud Legrand, Denis Trystram, Salah Zrigui. Adapting Batch Scheduling to Workload Characteristics: What can we expect From Online Learning?. IPDPS 2019 - 33rd IEEE International Parallel & Distributed Processing Symposium, May 2019, Rio de Janeiro, Brazil. pp.686-695, ⟨10.1109/IPDPS.2019.00077⟩. ⟨hal-02044903⟩

Share

Metrics

Record views

298

Files downloads

287