Skip to Main content Skip to Navigation
Conference papers

Online Scheduling with Redirection for Parallel Jobs

Abstract : An important component of High Performance Computing (HPC) clusters is the job scheduling algorithm, which decides the allocation and the scheduling of the jobs in the system. Such scheduling algorithms need to be scalable to confront the growth both in size and in complexity of the modern clusters. We propose in this paper a new algorithm for scheduling parallel jobs with redirection. Specifically, our algorithm redirects the jobs whose execution affects significantly an important number of other jobs. A redirected job is stopped and restarted from the beginning in a dedicated part of the cluster. We show the effectiveness of our method through an intensive experimental campaign of simulations of production cluster log traces.
Complete list of metadata

Cited literature [7 references]  Display  Hide  Download
Contributor : Adrien Faure Connect in order to contact the contributor
Submitted on : Monday, September 21, 2020 - 10:27:48 AM
Last modification on : Wednesday, July 6, 2022 - 4:24:34 AM
Long-term archiving on: : Thursday, December 3, 2020 - 2:37:52 PM


Files produced by the author(s)



Adrien Faure, Giorgio Lucarelli, Olivier Richard, Denis Trystram. Online Scheduling with Redirection for Parallel Jobs. IPDPSW 2020 - IEEE International Parallel and Distributed Processing Symposium Workshops, May 2020, New Orleans, France. pp.1-4, ⟨10.1109/IPDPSW50202.2020.00066⟩. ⟨hal-02944032⟩



Record views


Files downloads