An evaluation of user-level failure mitigation support in MPI, EuroMPI'12, pp.193-203, 2012. ,
Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999. ,
Resilient X10: Efficient failure-aware programming, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp.67-80, 2014. ,
Termination detection for diffusing computations, Information Processing Letters, vol.11, issue.1, pp.1-4, 1980. ,
Localized fault recovery for nested fork-join programs, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp.397-408, 2017. ,
An (n-1)-resilient algorithm for distributed termination detection, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.1, pp.63-78, 1995. ,
Adoption protocols for fanout-optimal fault-tolerant termination detection, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2013. ,
Exploring the APGAS programming model using the LULESH proxy application, IBM Research, 2015. ,
Transparent fault tolerance for scalable functional computation, Journal of Functional Programming, vol.26, 2016. ,
TLA+ specification of the optimistic finish protocol and the replication protocol ,