Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 1997

Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments

Pierre Sens
Bertil Folliot
  • Fonction : Auteur
  • PersonId : 829560

Résumé

This paper presents the performance evaluation of a software fault manager for distributed applications. Dubbed STAR, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. STAR is application independent, highly configurable and easily portable to UNIX-like operating systems. The current implementation is based on independent checkpointing and message logging. Measurements show the efficiency and the limits of this implementation. The challenge is to show that a software approach to fault tolerance can efficiently be implemented in a standard networked environment.
Fichier non déposé

Dates et versions

hal-01629551 , version 1 (06-11-2017)

Identifiants

Citer

Pierre Sens, Bertil Folliot. Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments. 26th International Conference on Parallel Processing, Aug 1997, Bloomington, IL, United States. pp.334-341, ⟨10.1109/ICPP.1997.622663⟩. ⟨hal-01629551⟩
24 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More