A. , B. And, W. , and N. J. , Measurement and interpretation of micro-benchmark and application energy use on the cray xc30, Energy Efficient Supercomputing Workshop, pp.51-59, 2014.

B. , L. Zyulkyarov, F. Unsal, O. And, M. et al., Unprotected computing: a large-scale study of dram raw error rate on a supercomputer, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.645-655, 2016.

C. , F. Geist, A. Gropp, W. Kale, S. Kramer et al., Update. Supercomputing Frontiers and Innovations, vol.1, p.24, 2014.

E. , N. And, S. , and B. , Reading between the lines of failure logs: Understanding how hpc systems fail, 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp.1-12, 2013.

G. , A. Cappello, F. Snir, M. And, K. et al., Fault prediction under the microscope: A closer look into HPC systems, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p.77, 2012.

G. E. , R. Vogt, R. Majumder, J. Alam, A. Burtscher et al., Effects of dynamic voltage and frequency scaling on a k20 gpu, 42nd International Conference on Parallel Processing (ICPP), pp.826-833, 2013.

G. , S. Patel, T. Engelmann, C. And, T. et al., Failures in large scale systems: long-term measurement, analysis, and implications, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p.44, 2017.

K. and N. , Investigating power efficiency and co-location effects on heterogeneous hpc architectures, 2013.

M. , A. Bailey, P. E. Lowenthal, D. K. Rountree, B. Schulz et al., A run-time system for powerconstrained hpc applications, International conference on high performance computing, pp.394-408, 2015.

N. , B. Xue, J. Gupta, S. Engelmann, C. Smirni et al., Characterizing temperature, power, and soft-error behaviors in data center systems: Insights, challenges, and opportunities, IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS, pp.22-31, 2017.

P. , E. Weber, W. And, B. , and L. A. , Failure trends in a large disk drive population, FAST, vol.7, pp.17-23, 2007.

R. , B. Ahn, D. H. De-supinski, B. R. Lowenthal, D. K. And et al., Beyond dvfs: A first look at performance under a hardware-enforced power bound, Parallel and Distributed Processing Symposium Workshops & PhD Forum, pp.947-953, 2012.

R. , B. Lownenthal, D. K. De-supinski, B. R. Schulz, M. Freeh et al., Adagio: making dvs practical for complex hpc applications, Proceedings of the 23rd international conference on Supercomputing, pp.460-469, 2009.

S. , B. And, G. , and G. A. , Disk failures in the real world: What does an mttf of 1, 000, 000 hours mean to you? In FAST, vol.7, pp.1-16, 2007.

S. , B. And, G. , and G. A. , Understanding failures in petascale computers, Journal of Physics: Conference Series, vol.78, p.12022, 2007.

A. D. Simpson, M. Bull, H. And, and J. , Identification and categorisation of applications and initial benchmarks suite, 2008.

S. , M. Wisniewski, R. W. , A. , J. A. Adve et al., Addressing failures in exascale computing, The International Journal of High Performance Computing Applications, vol.28, pp.129-173, 2014.

S. , V. Debardeleben, N. Blanchard, S. Ferreira, K. B. Stearley et al., Memory errors in modern systems: The good, the bad, and the ugly, In ACM SIGPLAN Notices, vol.50, pp.297-310, 2015.

W. , G. Zhang, L. And, X. U. , and W. , What can we learn from four years of data center hardware failures, 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN, pp.25-36, 2017.

Y. , K. Uno, A. Murai, H. Tsukamoto, T. Shoji et al., The k computer operations: experiences and statistics, Procedia Computer Science, vol.29, pp.576-585, 2014.