Checkpointing algorithms and fault prediction - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of Parallel and Distributed Computing Année : 2013

Checkpointing algorithms and fault prediction

Résumé

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide optimal algorithms to decide whether and when to take predictions into account, and we derive the optimal value of the checkpointing period. These results allow us to analytically assess the key parameters that impact the performance of fault predictors at very large scale.
Fichier principal
Vignette du fichier
main.pdf (464.66 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00908446 , version 1 (23-11-2013)

Identifiants

Citer

Guillaume Aupy, Yves Robert, Frédéric Vivien, Dounia Zaidouni. Checkpointing algorithms and fault prediction. Journal of Parallel and Distributed Computing, 2013, 74 (2), pp.2048-2064. ⟨10.1016/j.jpdc.2013.10.010⟩. ⟨hal-00908446⟩
159 Consultations
280 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More