M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards et al., Improving Performance via Mini-applications, Research Report, vol.5574, 2009.

M. Shantharam, Y. Youn, and P. Raghavan, SPEEDUP-AWARE CO-SCHEDULES FOR EFFICIENT WORKLOAD MANAGEMENT, Parallel Processing Letters, vol.25, issue.02
DOI : 10.1006/jcph.1995.1039

G. Aupy, M. Shantharam, A. Benoit, Y. Robert, and P. Raghavan, Co-scheduling algorithms for highthroughput workload execution
DOI : 10.1007/s10951-015-0445-x

URL : https://hal.archives-ouvertes.fr/hal-00819036

E. N. Elnozahy, L. Alvisi, Y. Wang, and D. B. Johnson, A survey of rollback-recovery protocols in message-passing systems, ACM Computing Surveys, vol.34, issue.3, pp.375-408, 2002.
DOI : 10.1145/568522.568525

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115

J. T. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2004.
DOI : 10.1016/j.future.2004.11.016

J. Blazewicz, M. Drabowski, and J. Weglarz, Scheduling multiprocessor tasks to minimize schedule length, Computers, IEEE Transactions on C, vol.35, issue.5, pp.389-393, 1986.
DOI : 10.1109/tc.1986.1676781

J. Du and J. Y. Leung, Complexity of Scheduling Parallel Task Systems, SIAM Journal on Discrete Mathematics, vol.2, issue.4, pp.473-487, 1989.
DOI : 10.1137/0402042

J. Blazewicz, M. Machowiak, G. Mounié, and D. Trystram, Approximation Algorithms for Scheduling Independent Malleable Tasks, Euro-Par, 2001.
DOI : 10.1007/3-540-44681-8_29

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation , PLDI '98, pp.212-223, 1998.

G. Martín, D. E. Singh, M. Marinescu, and J. Carretero, Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration, Parallel Computing, vol.46, pp.60-77, 2015.
DOI : 10.1016/j.parco.2015.04.003

D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira et al., Detection and correction of silent data corruption for large-scale high-performance computing, Proceedings of SC'12, pp.781-7812, 2012.

X. Ni, E. Meneses, and L. Kale, Hiding Checkpoint Overhead in HPC Applications with a Semi-Blocking Algorithm, 2012 IEEE International Conference on Cluster Computing, pp.364-372, 2012.
DOI : 10.1109/CLUSTER.2012.82

J. Dongarra, T. Hérault, and Y. Robert, Performance and reliability trade-offs for the double checkpointing algorithm, International Journal of Networking and Computing, vol.4, issue.1, pp.23-41, 2014.
DOI : 10.15803/ijnc.4.1_23

URL : https://hal.archives-ouvertes.fr/hal-01091928

N. Muthuvelu, I. Chai, E. Chikkannan, and R. Buyya, Batch Resizing Policies and Techniques for Fine-Grain Grid Tasks: The Nuts and Bolts, Journal of Information Processing Systems, vol.7, issue.2
DOI : 10.3745/JIPS.2011.7.2.299

T. Herault and Y. Robert, Fault-Tolerance Techniques for High-Performance Computing, 2015.
DOI : 10.1007/978-3-319-20943-2

URL : https://hal.archives-ouvertes.fr/hal-01200479

J. W. Young, A first order approximation to the optimum checkpoint interval, Communications of the ACM, vol.17, issue.9, pp.530-531, 1974.
DOI : 10.1145/361147.361115

P. B. Bhat, C. S. Raghavendra, and V. K. Prasanna, Efficient collective communication in distributed heterogeneous systems, JPDC, vol.63, issue.3, pp.251-263, 2003.
DOI : 10.1109/icdcs.1999.776502

A. Benoit, L. Pottier, and Y. Robert, Resilient application co-scheduling with processor redistribution, Research report RR-8795, INRIA, available at graal, 2015.
DOI : 10.1109/icpp.2016.21

M. Bougeret, H. Casanova, M. Rabie, Y. Robert, and F. Vivien, Checkpointing strategies for parallel jobs, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '11, pp.1-11, 2011.
DOI : 10.1145/2063384.2063428

URL : https://hal.archives-ouvertes.fr/inria-00560582

G. Bosilca, A. Bouteiller, E. Brunet, F. Cappello, J. Dongarra et al., Unified model for assessing checkpointing protocols at extreme-scale, Concurrency and Computation: Practice and Experience, vol.9, issue.16, pp.2772-2791, 2014.
DOI : 10.1109/SNAPI.2010.10

URL : https://hal.archives-ouvertes.fr/hal-00696154