E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.
DOI : 10.1145/568522.568525

M. Chandy and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, vol.3, issue.1, pp.63-75
DOI : 10.1145/214451.214456

C. Coti, T. Herault, P. Lemarinier, L. Pilard, A. Rezmerita et al., Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI, ACM/IEEE SC 2006 Conference (SC'06), p.127, 2006.
DOI : 10.1109/SC.2006.15

URL : https://hal.archives-ouvertes.fr/hal-00684891

S. Himadri, . Paul, A. Gupta, and R. Badrinath, Hierarchical Coordinated Checkpointing Protocol, International Conference on Parallel and Distributed Computing Systems, pp.240-245, 2002.

K. Bhatia, K. Marzullo, and L. Alvisi, Scalable causal Message Logging for Wide- Area Environments. Concurency and Computation: Practice and Experience, pp.873-889, 2003.

S. Monnet, C. Morin, and R. Badrinath, Hybrid checkpointing for parallel applications in cluster federations, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004., pp.773-782, 2004.
DOI : 10.1109/CCGrid.2004.1336712

URL : https://hal.archives-ouvertes.fr/inria-00000991