A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75 ,
DOI : 10.1145/214451.214456
Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI, ACM/IEEE SC 2006 Conference (SC'06), p.127, 2006. ,
DOI : 10.1109/SC.2006.15
URL : https://hal.archives-ouvertes.fr/hal-00684891
Hierarchical Coordinated Checkpointing Protocol, International Conference on Parallel and Distributed Computing Systems, pp.240-245, 2002. ,
Scalable causal Message Logging for Wide- Area Environments. Concurency and Computation: Practice and Experience, pp.873-889, 2003. ,
Hybrid checkpointing for parallel applications in cluster federations, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., pp.773-782, 2004. ,
DOI : 10.1109/CCGrid.2004.1336712
URL : https://hal.archives-ouvertes.fr/inria-00000991