W. Tang, B. Want, S. Ethier, and Z. Lin, Performance portability of hpc discovery science software: Fusion energy turbulence simulations at extreme scale, Supercomputing frontiers and innovations, vol.4, issue.1, 2017.

A. S. Kozelkov, V. V. Kurulin, S. V. Lashkin, R. M. Shagaliev, and A. V. Yalozo, Investigation of supercomputer capabilities for the scalable numerical simulation of computational fluid dynamics problems in industrial applications, Computational Mathematics and Mathematical Physics, vol.56, issue.8, pp.1506-1516, 2016.

P. Vranas, G. Bhanot, M. Blumrich, D. Chen, A. Gara et al., The bluegene/l supercomputer and quantum chromodynamics, ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp.50-57, 2006.

S. R. Ellingson, J. C. Smith, and J. Baudry, Polypharmacology and supercomputer-based docking: opportunities and challenges, Molecular Simulation, vol.40, issue.10, pp.848-854, 2014.

D. Aocnp, Watson will see you now: a supercomputer to help clinicians make informed treatment decisions, Clinical journal of oncology nursing, vol.19, issue.1, p.31, 2015.

A. B. Yoo, M. A. Jette, and M. Grondona, Slurm: Simple linux utility for resource management, Job Scheduling Strategies for Parallel Processing, pp.44-60, 2003.

N. Desai, Cobalt: an open source platform for hpc system software research, Edinburgh BG/L System Software Workshop, pp.803-820, 2005.

G. Staples, Torque resource manager, ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), p.8, 2006.

J. Ansel, K. Arya, and G. Cooperman, DMTCP: Transparent checkpointing for cluster computations and the desktop, IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2009.

B. Nicolae, A. Moody, E. Gonsiorowski, K. Mohror, and F. Cappello, Veloc: Towards high performance adaptive asynchronous checkpointing at large scale, IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.911-920, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02184203

J. Daly, A higher order estimate of the optimum checkpoint interval for restart dumps, Future Generation Computer Systems, vol.22, issue.3, pp.303-312, 2006.

R. Tyagi and S. K. Gupta, A survey on scheduling algorithms for parallel and distributed systems, Silicon Photonics & High Performance Computing, pp.51-64, 2018.

V. J. Leung, G. Sabin, and P. Sadayappan, Parallel job scheduling policies to improve fairness: A case study, International Conference on Parallel Processing Workshops (ICPP), pp.346-353, 2010.

A. A. Chandio, K. Bilal, N. Tziritas, Z. Yu, Q. Jiang et al., A comparative study on resource allocation and energy efficient job scheduling strategies in large-scale parallel computing systems, Cluster computing, vol.17, issue.4, pp.1349-1367, 2014.

A. W. Mu and D. G. Feitelson, Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol.12, issue.6, pp.529-543, 2001.

C. Gómez-martín, M. A. Vega-rodríguez, and J. González-sánchez, Fattened backfilling: An improved strategy for job scheduling in parallel systems, Journal of Parallel and Distributed Computing (JPDC), vol.97, pp.69-77, 2016.

B. Lawson and E. Smirni, Multiple-queue backfilling scheduling with priorities and reservations for parallel systems, ACM SIGMETRICS Performance Evaluation Review, vol.29, pp.72-87, 2002.

A. Tousimojarad and W. Vanderbauwhede, An efficient thread mapping strategy for multiprogramming on manycore processors, Advances in Parallel Computing, vol.25, pp.63-71, 2014.

S. G. Ahmad, C. S. Liew, M. M. Rafique, E. U. Munir, and S. U. Khan, Data-intensive workflow optimization based on application task graph partitioning in heterogeneous computing systems, IEEE International Conference on Big Data and Cloud Computing, pp.129-136, 2014.

D. Wang, E. Jung, R. Kettimuthu, I. Foster, D. J. Foran et al., Supporting Real-Time Jobs on the IBM Blue Gene/Q: Simulation-Based Study, Job Scheduling Strategies for Parallel Processing, pp.83-102, 2018.

N. Trebon, Enabling urgent computing within the existing distributed computing infrastructure, 2011.

Q. Snell, M. Clement, and D. Jackson, Preemption based backfill, Job Scheduling Strategies for Parallel Processing, pp.24-37, 2002.

J. Meza, T. Xu, K. Veeraraghavan, and O. Mutlu, A large scale study of data center network reliability, ACM Internet Measurement Conference (IMC), pp.393-407, 2018.

S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski et al., B4: Experience with a Globally Deployed Software Defined WAN, ACM SIGCOMM, 2013.

J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost et al., Spanner: Google's globally distributed database, ACM Transactions on Computer Systems (TOCS), vol.31, issue.3, 2013.

A. Vulimiri, C. Curino, P. B. Godfrey, T. Jungblut, J. Padhye et al., Global analytics in the face of bandwidth and regulatory constraints, USENIX Networked Systems Design and Implementation (NSDI), pp.323-336, 2015.

S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin et al., f4: Facebook's Warm BLOB Storage System, USENIX Operating Systems Design and Implementation (OSDI), pp.383-398, 2014.