Optimizing jobs timeouts on clusters and production grids

Tristan Glatard 1 Johan Montagnat 1 Xavier Pennec 2
1 Laboratoire d'Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S) / Equipe MODALIS
Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
2 ASCLEPIOS - Analysis and Simulation of Biomedical Images
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into ac- count a proportion of outliers to model either reliable clus- ters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up even for reliable systems without outliers.
Type de document :
Communication dans un congrès
Cluster Computing and the Grid, May 2007, Rio de Janeiro, Brazil. pp.100-107, 2007, 〈10.1109/CCGRID.2007.78〉
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00683172
Contributeur : Johan Montagnat <>
Soumis le : mercredi 28 mars 2012 - 10:37:54
Dernière modification le : jeudi 7 février 2019 - 15:46:35
Document(s) archivé(s) le : vendredi 29 juin 2012 - 02:21:16

Fichier

CCGRID07.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Tristan Glatard, Johan Montagnat, Xavier Pennec. Optimizing jobs timeouts on clusters and production grids. Cluster Computing and the Grid, May 2007, Rio de Janeiro, Brazil. pp.100-107, 2007, 〈10.1109/CCGRID.2007.78〉. 〈hal-00683172〉

Partager

Métriques

Consultations de la notice

360

Téléchargements de fichiers

169