Adaptive incremental checkpointing for massively parallel systems, Proceedings of the 18th annual international conference on Supercomputing , ICS '04, pp.277-286, 2004. ,
DOI : 10.1145/1006209.1006248
FTI, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3232, 2011. ,
DOI : 10.1145/2063384.2063427
URL : https://hal.archives-ouvertes.fr/hal-00721216
Transparent redundant computing with mpi. In EuroMPI'10: Proceedings of the 17th European MPI user's group meeting conference on recent advances in the message passing interface, pp.208-218, 2010. ,
The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations, Monthly Weather Review, vol.137, issue.6, pp.1770-1789, 2009. ,
DOI : 10.1175/2008MWR2709.1
PVFS: A parallel file system for Linux clusters, Proceedings of the 4th Annual Linux Showcase and Conference, pp.317-327, 2000. ,
Live migration of virtual machines, NSDI'05: Proceedings of the 2nd Symposium on Networked Systems Design & Implementation, pp.273-286, 2005. ,
Working Sets Past and Present, IEEE Transactions on Software Engineering, vol.6, issue.1, pp.64-84, 1980. ,
DOI : 10.1109/TSE.1980.230464
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems, ACM Transactions on Architecture and Code Optimization, vol.8, issue.2, pp.1-629, 2011. ,
DOI : 10.1145/1970386.1970387
Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O, 2012 IEEE International Conference on Cluster Computing, 2012. ,
DOI : 10.1109/CLUSTER.2012.26
URL : https://hal.archives-ouvertes.fr/hal-00715252
A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002. ,
DOI : 10.1145/568522.568525
A scalable concurrent malloc(3) implementation for FreeBSD, Proceedings of BSDCan 2006, 2006. ,
libhashckpt: Hash-Based Incremental Checkpointing Using GPU???s, EuroMPI'11: Proceedings of the 18th European MPI Users' Group Conference on Recent Advances in the Message Passing Interface, pp.272-281, 2011. ,
DOI : 10.1007/978-3-642-24449-0_31
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers, ACM/IEEE SC 2005 Conference (SC'05), pp.1-9, 2005. ,
DOI : 10.1109/SC.2005.76
Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds, Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, pp.313-324, 2012. ,
DOI : 10.1007/978-3-642-32820-6_32
URL : https://hal.archives-ouvertes.fr/hal-00703119
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010. ,
DOI : 10.1109/SC.2010.12
Optimized pre-copy live migration for memory intensive applications, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-40, 2011. ,
DOI : 10.1145/2063384.2063437
Application monitoring and checkpointing in HPC, Proceedings of the 50th Annual Southeast Regional Conference on, ACM-SE '12, pp.262-267, 2012. ,
DOI : 10.1145/2184512.2184574
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage, Information Sciences, vol.178, issue.15, pp.3109-3116, 2008. ,
DOI : 10.1016/j.ins.2008.04.001
Scrabble ??? a distributed application with an emphasis on continuity, Software Engineering Journal, vol.5, issue.3, pp.160-164, 1990. ,
DOI : 10.1049/sej.1990.0018
Design, modeling, and evaluation of a scalable multi-level checkpointing system, SC '10: Proceedings of the 23rd International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010. ,
On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage. Transactions on Large-Scale Data-and Knowledge-Centered Systems, pp.167-184, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00613583
Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp.1-10, 2013. ,
DOI : 10.1109/IPDPS.2013.14
URL : https://hal.archives-ouvertes.fr/hal-00781532
BlobCR, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-3412, 2011. ,
DOI : 10.1145/2063384.2063429
URL : https://hal.archives-ouvertes.fr/inria-00601865
A hybrid local storage transfer scheme for live migration of I/O intensive workloads, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pp.85-96, 2012. ,
DOI : 10.1145/2287076.2287088
URL : https://hal.archives-ouvertes.fr/hal-00686654
SecondSite: Disaster Tolerance as a Service, VEE '12: Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, pp.97-108 ,
Comparing different approaches for incremental checkpointing: The showdown, Linux'11: The 13th Annual Linux Symposium, pp.69-79, 2011. ,
Hybrid Checkpointing for MPI Jobs in HPC Environments, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.524-533, 2010. ,
DOI : 10.1109/ICPADS.2010.48