M. Albrecht, P. Donnelly, P. Bui, and D. Thain, Makeflow, Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, SWEET '12, 2012.
DOI : 10.1145/2443416.2443417

I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher et al., Kepler: an extensible system for design and execution of scientific workflows, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004., pp.423-424, 2004.
DOI : 10.1109/SSDM.2004.1311241

URL : http://www.sdsc.edu/~ludaesch/Paper/ssdbm04-kepler.pdf

I. Assayad, A. Girault, and H. Kalla, A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints, International Conference on Dependable Systems and Networks, 2004, 2004.
DOI : 10.1109/DSN.2004.1311904

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
DOI : 10.1007/978-3-642-03869-3_80

URL : https://hal.archives-ouvertes.fr/inria-00384363

G. Aupy, A. Benoit, H. Casanova, and Y. Robert, Scheduling Computational Workflows on Failure-Prone Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp.2-26, 2016.
DOI : 10.1109/IPDPSW.2015.33

URL : https://hal.archives-ouvertes.fr/hal-01075100

L. Bautista, G. , and F. Cappello, Detecting silent data corruption through data dynamic monitoring for scientific applications, ACM SIGPLAN Notices, vol.49, issue.8, pp.381-382, 2014.
DOI : 10.1109/MM.2005.110

L. Bautista, G. , and F. Cappello, Detecting and correcting data corruption in stencil applications through multivariate interpolation, FTS. IEEE, 2015.

A. Benoit, A. Cavelan, Y. Robert, and H. Sun, Assessing generalpurpose algorithms to cope with fail-stop and silent errors, ACM Trans. Parallel Computing, vol.3, issue.2, p.2016
DOI : 10.1145/2897189

URL : https://hal.archives-ouvertes.fr/hal-01066664

E. Berrocal, L. Bautista-gomez, and S. Di, Zillin Lan, and Franck Cappello. Lightweight silent data corruption detection based on runtime data analysis for HPC applications, HPDC. ACM, 2015.

S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M. Su et al., Characterization of scientific workflows, 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp.1-10, 2008.
DOI : 10.1109/WORKS.2008.4723958

G. Bosilca, R. Delmas, J. Dongarra, and J. Langou, Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009.
DOI : 10.1016/j.jpdc.2008.12.002

D. Tracy, H. J. Braun, N. Siegel, . Beck, L. Ladislau et al., A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed computing, issue.6, pp.61810-837, 2001.

C. Cao, T. Herault, G. Bosilca, and J. Dongarra, Design for a Soft Error Resilient Dynamic Task-Based Runtime, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.765-774, 2015.
DOI : 10.1109/IPDPS.2015.81

F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer et al., Toward Exascale Resilience, The International Journal of High Performance Computing Applications, vol.23, issue.4, p.2014
DOI : 10.1515/9781400882618-003

URL : http://institute.lanl.gov/resilience/docs/Toward%20Exascale%20Resilience.pdf

J. Choi, J. Jack, S. Dongarra, . Ostrouchov, P. Antoine et al., Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming, pp.173-184, 1996.
DOI : 10.1155/1996/483083

URL : https://doi.org/10.1155/1996/483083

R. Ferreira-da-silva, W. Chen, G. Juve, K. Vahi, and E. Deelman, Community resources for enabling research in distributed scientific workflows, e- Science (e-Science), 2014 IEEE 10th International Conference on, pp.177-184, 2014.

E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil et al., Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, vol.13, issue.3, pp.219-237, 2005.
DOI : 10.1155/2005/128026

URL : http://downloads.hindawi.com/journals/sp/2005/128026.pdf

E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan et al., Pegasus, a workflow management system for science automation, Future Generation Computer Systems, vol.46, pp.17-35, 2015.
DOI : 10.1016/j.future.2014.10.008

URL : https://manuscript.elsevier.com/S0167739X14002015/pdf/S0167739X14002015.pdf

B. Allen and . Downey, The structural cause of file size distributions In Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Proceedings. Ninth International Symposium on, pp.361-370, 2001.

M. Drozdowski, Scheduling for Parallel Processing Computer Communications and Networks, 2009.

T. Fahringer, R. Prodan, R. Duan, J. Hofer, F. Nadeem et al., ASKALON: A Development and Grid Computing Environment for Scientific Workflows, Workflows for e-Science, pp.450-471, 2007.
DOI : 10.1007/978-1-84628-757-2_27

L. Han, L. Canon, H. Casanova, Y. Robert, and F. Vivien, Checkpointing workflows for fail-stop errors, IEEE Transactions on Computers, 2018.
DOI : 10.1109/tc.2018.2801300

URL : https://hal.archives-ouvertes.fr/hal-01559967

K. Huang and J. A. Abraham, Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984.

H. Jin, X. Sun, Z. Zheng, Z. Lan, and B. Xie, Performance under Failures of DAG-based Parallel Computing, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009.
DOI : 10.1109/CCGRID.2009.55

URL : http://www.cs.iit.edu/~scs/psfiles/3622a236.pdf

G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013.
DOI : 10.1016/j.future.2012.08.015

E. Kail, M. Péter-fchtpen, and . Kozlovszky, A novel adaptive checkpointing method based on information obtained from workflow structure, Computer Science, vol.17, issue.3, p.2016
DOI : 10.7494/csci.2016.17.3.387

URL : https://journals.agh.edu.pl/csci/article/download/1797/1554

A. Pothen and C. Sun, A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993.
DOI : 10.1137/0914074

M. Shantharam, S. Srinivasmurthy, and P. Raghavan, Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.
DOI : 10.1145/2304576.2304588

T. Tobita and H. Kasahara, A standard task graph set for fair evaluation of multiprocessor scheduling algorithms, Journal of Scheduling, vol.70, issue.5, pp.379-394, 2002.
DOI : 10.1109/TC.1973.5009153

H. Topcuoglu, S. Hariri, and M. Wu, Performance-effective and lowcomplexity task scheduling for heterogeneous computing, IEEE transactions on parallel and distributed systems, pp.260-274, 2002.
DOI : 10.1109/71.993206

URL : http://meseec.ce.rit.edu/eecc722-fall2002/papers/hc/5/l0260.pdf

S. Toueg and Ö. Babao?lu, On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, 1984.
DOI : 10.1137/0213039

URL : http://ecommons.cornell.edu/bitstream/1813/6386/1/83-546.pdf

J. Valdes, R. E. Tarjan, and E. L. Lawler, The recognition of series parallel digraphs, Proc. of STOC'79, pp.1-12, 1979.
DOI : 10.1145/800135.804393

P. Wang, K. Zhang, R. Chen, H. Chen, and H. Guan, Replication-Based Fault-Tolerance for Large-Scale Graph Processing, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.562-573, 2014.
DOI : 10.1109/DSN.2014.58

M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, S. Daniel et al., Swift: A language for distributed parallel scripting, Parallel Computing, vol.37, issue.9, pp.633-652, 2011.
DOI : 10.1016/j.parco.2011.05.005

URL : http://www.mcs.anl.gov/uploads/cels/papers/P1818.pdf

K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers et al., The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol.2011, issue.W1, p.328, 2013.
DOI : 10.1186/1752-0509-6-25

M. Y. Wu and D. D. Gajski, Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3
DOI : 10.1109/71.80160

URL : http://www.eece.unm.edu/~shu/lab/paper/htooltrans.pdf

F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki et al., Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1352-1363, 2012.
DOI : 10.1109/IPDPS.2012.122