Area Failures and Reliable Distributed Applications - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Area Failures and Reliable Distributed Applications

Résumé

Because fault failures tend to affect whole areas, in some cases, and not only individual computers, we propose a new, efficient scheduling algorithm for problems in which tasks with precedence constraints and communication delays have to be scheduled on a virtual heterogeneous distributed multi areas system subject to the possibility of one complete area failure. Based on an extension of the Critical- Path Method CPM/PERT, our algorithm combines an optimal schedule when there is no failures, with some tasks duplication to provide fault-tolerance in the case of the failure of one area. Backup copies are not established for tasks that have already more than one original copy in different areas. The result is a schedule in polynomial time that is optimal when there is no area failure, and is a good reliable schedule in the case of any one area failure. We finally do some numerical experiments in which we use our algorithm on several semi-random DAGs and compare the optimal solutions with the reliable solutions found by this algorithm.
Fichier principal
Vignette du fichier
nakech_icces09_36.pdf (322.9 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00443653 , version 1 (09-02-2010)

Identifiants

  • HAL Id : hal-00443653 , version 1

Citer

Moustafa Nakechbandi, Jean-Yves Colin. Area Failures and Reliable Distributed Applications. ICCES 09, Dec 2009, Le Caire, Egypt. pp.CD. ⟨hal-00443653⟩
55 Consultations
114 Téléchargements

Partager

Gmail Facebook X LinkedIn More