| HAL : inria-00435092, version 1 |
| Voir la fiche détaillée | BibTeX,EndNote,... |
|
|
| IEEE Transactions on Computers (2000) |
|
|
|
|
| An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures |
|
|
| Christine Morin 1Anne-Marie Kermarrec 2 |
|
|
| (05/2000) |
|
|
| Distributed Shared Memory (DSM) architectures are attractive to execute high performance parallel applications. Made up of a large number of components, these architectures have however a high probability of failure. We propose a protocol to tolerate node failures in cache-based DSM architectures. The proposed solution is based on backward error recovery and consists of an extension to the existing coherence protocol to manage data used by processors for the computation and recovery data used for fault tolerance. This approach can be applied to both Cache Only Memory Architectures (COMA) and Shared Virtual Memory (SVM) systems. The implementation of the protocol in a COMA architecture has been evaluated by simulation. The protocol has also been implemented in an SVM system on a network of workstations. Both simulation results and measurements show that our solution is efficient and scalable. |
|
|
|
|
|
|
|
|
|
|
| 1 : | PARIS (INRIA - IRISA) |
| CNRS : UMR6074 – INRIA – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1 | |
| 2 : | Microsoft Research [Cambridge] (Microsoft) |
| Microsoft Research | |
| 3 : | SOLIDOR (INRIA - IRISA) |
| CNRS : UMR6074 – INRIA – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1 | |
| 4 : | IBM Watson Research Center |
| IBM | |
|
|
|
|
|
|
|
|
| Domaine | : | Informatique/Système d'exploitation |
|
|
| Liste des fichiers attachés à ce document : | |||||
|
|
|
| inria-00435092, version 1 | |
| http://hal.inria.fr/inria-00435092 | |
| oai:hal.inria.fr:inria-00435092 | |
| Contributeur : Christine Morin | |
| Soumis le : Lundi 23 Novembre 2009, 15:32:24 | |
| Dernière modification le : Mardi 24 Novembre 2009, 09:39:18 | |