Makeflow, Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, SWEET '12, 2012. ,
DOI : 10.1145/2443416.2443417
Kepler: an extensible system for design and execution of scientific workflows, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004., pp.423-424, 2004. ,
DOI : 10.1109/SSDM.2004.1311241
URL : http://www.sdsc.edu/~ludaesch/Paper/ssdbm04-kepler.pdf
A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints, International Conference on Dependable Systems and Networks, 2004, 2004. ,
DOI : 10.1109/DSN.2004.1311904
Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363
Scheduling Computational Workflows on Failure-Prone Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp.2-26, 2016. ,
DOI : 10.1109/IPDPSW.2015.33
URL : https://hal.archives-ouvertes.fr/hal-01075100
Detecting silent data corruption through data dynamic monitoring for scientific applications, ACM SIGPLAN Notices, vol.49, issue.8, pp.381-382, 2014. ,
DOI : 10.1109/MM.2005.110
Detecting and correcting data corruption in stencil applications through multivariate interpolation, FTS. IEEE, 2015. ,
Assessing generalpurpose algorithms to cope with fail-stop and silent errors, ACM Trans. Parallel Computing, vol.3, issue.2, p.2016 ,
DOI : 10.1145/2897189
URL : https://hal.archives-ouvertes.fr/hal-01066664
Zillin Lan, and Franck Cappello. Lightweight silent data corruption detection based on runtime data analysis for HPC applications, HPDC. ACM, 2015. ,
Characterization of scientific workflows, 2008 Third Workshop on Workflows in Support of Large-Scale Science, pp.1-10, 2008. ,
DOI : 10.1109/WORKS.2008.4723958
Algorithm-based fault tolerance applied to high performance computing, Journal of Parallel and Distributed Computing, vol.69, issue.4, pp.410-416, 2009. ,
DOI : 10.1016/j.jpdc.2008.12.002
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed computing, issue.6, pp.61810-837, 2001. ,
Design for a Soft Error Resilient Dynamic Task-Based Runtime, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.765-774, 2015. ,
DOI : 10.1109/IPDPS.2015.81
Toward Exascale Resilience, The International Journal of High Performance Computing Applications, vol.23, issue.4, p.2014 ,
DOI : 10.1515/9781400882618-003
URL : http://institute.lanl.gov/resilience/docs/Toward%20Exascale%20Resilience.pdf
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines, Scientific Programming, pp.173-184, 1996. ,
DOI : 10.1155/1996/483083
URL : https://doi.org/10.1155/1996/483083
Community resources for enabling research in distributed scientific workflows, e- Science (e-Science), 2014 IEEE 10th International Conference on, pp.177-184, 2014. ,
Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems, Scientific Programming, vol.13, issue.3, pp.219-237, 2005. ,
DOI : 10.1155/2005/128026
URL : http://downloads.hindawi.com/journals/sp/2005/128026.pdf
Pegasus, a workflow management system for science automation, Future Generation Computer Systems, vol.46, pp.17-35, 2015. ,
DOI : 10.1016/j.future.2014.10.008
URL : https://manuscript.elsevier.com/S0167739X14002015/pdf/S0167739X14002015.pdf
The structural cause of file size distributions In Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Proceedings. Ninth International Symposium on, pp.361-370, 2001. ,
Scheduling for Parallel Processing Computer Communications and Networks, 2009. ,
ASKALON: A Development and Grid Computing Environment for Scientific Workflows, Workflows for e-Science, pp.450-471, 2007. ,
DOI : 10.1007/978-1-84628-757-2_27
Checkpointing workflows for fail-stop errors, IEEE Transactions on Computers, 2018. ,
DOI : 10.1109/tc.2018.2801300
URL : https://hal.archives-ouvertes.fr/hal-01559967
Algorithm-based fault tolerance for matrix operations, IEEE Trans. Comput, vol.33, issue.6, pp.518-528, 1984. ,
Performance under Failures of DAG-based Parallel Computing, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. ,
DOI : 10.1109/CCGRID.2009.55
URL : http://www.cs.iit.edu/~scs/psfiles/3622a236.pdf
Characterizing and profiling scientific workflows, Future Generation Computer Systems, vol.29, issue.3, pp.682-692, 2013. ,
DOI : 10.1016/j.future.2012.08.015
A novel adaptive checkpointing method based on information obtained from workflow structure, Computer Science, vol.17, issue.3, p.2016 ,
DOI : 10.7494/csci.2016.17.3.387
URL : https://journals.agh.edu.pl/csci/article/download/1797/1554
A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1257, 1993. ,
DOI : 10.1137/0914074
Fault tolerant preconditioned conjugate gradient for sparse linear system solution, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012. ,
DOI : 10.1145/2304576.2304588
A standard task graph set for fair evaluation of multiprocessor scheduling algorithms, Journal of Scheduling, vol.70, issue.5, pp.379-394, 2002. ,
DOI : 10.1109/TC.1973.5009153
Performance-effective and lowcomplexity task scheduling for heterogeneous computing, IEEE transactions on parallel and distributed systems, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
URL : http://meseec.ce.rit.edu/eecc722-fall2002/papers/hc/5/l0260.pdf
On the Optimum Checkpoint Selection Problem, SIAM Journal on Computing, vol.13, issue.3, 1984. ,
DOI : 10.1137/0213039
URL : http://ecommons.cornell.edu/bitstream/1813/6386/1/83-546.pdf
The recognition of series parallel digraphs, Proc. of STOC'79, pp.1-12, 1979. ,
DOI : 10.1145/800135.804393
Replication-Based Fault-Tolerance for Large-Scale Graph Processing, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.562-573, 2014. ,
DOI : 10.1109/DSN.2014.58
Swift: A language for distributed parallel scripting, Parallel Computing, vol.37, issue.9, pp.633-652, 2011. ,
DOI : 10.1016/j.parco.2011.05.005
URL : http://www.mcs.anl.gov/uploads/cels/papers/P1818.pdf
The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud, Nucleic Acids Research, vol.2011, issue.W1, p.328, 2013. ,
DOI : 10.1186/1752-0509-6-25
Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3 ,
DOI : 10.1109/71.80160
URL : http://www.eece.unm.edu/~shu/lab/paper/htooltrans.pdf
Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.1352-1363, 2012. ,
DOI : 10.1109/IPDPS.2012.122