Improving Backfilling by using Machine Learning to predict Running Times

Abstract : The job management system is the HPC middleware responsible for distributing computing power to applications. While such systems generate an ever increasing amount of data, they are characterized by uncertainties on some parameters like the job running times. The question raised in this work is: To what extent is it possible/useful to take into account predictions on the job running times for improving the global scheduling? We present a comprehensive study for answering this question assuming the popular EASY backfilling policy. More precisely, we rely on some classical methods in machine learning and propose new cost functions well-adapted to the problem. Then, we assess our proposed solutions through intensive simulations using several production logs. Finally, we propose a new scheduling algorithm that outperforms the popular EASY backfilling algorithm by 28% considering the average bounded slowdown objective.
Type de document :
Communication dans un congrès
SuperComputing 2015, Nov 2015, Austin, TX, United States. SuperComputing 2015, <http://sc15.supercomputing.org/>. <10.1145/2807591.2807646>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01221186
Contributeur : Valentin Reis <>
Soumis le : mardi 27 octobre 2015 - 15:33:50
Dernière modification le : mercredi 2 décembre 2015 - 15:06:11

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Licence


Copyright (Tous droits réservés)

Identifiants

Collections

Citation

Eric Gaussier, David Glesser, Valentin Reis, Denis Trystram. Improving Backfilling by using Machine Learning to predict Running Times. SuperComputing 2015, Nov 2015, Austin, TX, United States. SuperComputing 2015, <http://sc15.supercomputing.org/>. <10.1145/2807591.2807646>. <hal-01221186>

Partager

Métriques

Consultations de
la notice

304

Téléchargements du document

316