C. Mei, Y. Sun, G. Zheng, E. J. Bohm, L. V. Kalé et al., Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicoreoptimized message-driven runtime, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), vol.61, p.11, 2011.

C. R. Noble, A. T. Anderson, N. R. Barton, J. A. Bramwell, A. Capps et al., Ale3d: An arbitrary lagrangian-eulerian multiphysics code, vol.5, p.2017

J. Paudel, O. Tardieu, and J. N. Amaral, On the merits of distributed work-stealing on selective locality-aware tasks, Proceedings of International Conference on Parallel Processing (ICPP)

F. Lyon, , pp.100-109, 2013.

R. Al-omairy, G. Miranda, H. Ltaief, R. Badia, X. Martorell et al., Dense matrix computations on numa architectures with distance-aware work stealing, J. Supercomputing Frontiers and Innovations (JSFI), vol.2, issue.1, 2015.

J. Mair, Z. Huang, D. Eyers, and Y. Chen, Quantifying the energy efficiency challenges of achieving exascale computing, Proceedings of International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2015.

H. Menon, B. Acun, S. G. De-gonzalo, O. Sarood, and L. Kalé, Thermal aware automated load balancing for hpc applications, 2013 IEEE International Conference on Cluster Computing (CLUS-TER), pp.1-8, 2013.

M. R. Garey and D. S. Johnson, strong " np-completeness results: Motivation, examples, and implications, J. ACM, vol.25, issue.3, pp.499-508, 1978.

J. Lenstra, A. R. Kan, and P. Brucker, Complexity of machine scheduling problems, Studies in Integer Programming, ser. Annals of Discrete Mathematics, vol.1, pp.343-362, 1977.

H. Menon, N. Jain, G. Zheng, and L. Kalé, Automated load balancing invocation based on application characteristics, International Conference on Cluster Computing (CLUSTER), pp.373-381, 2012.

O. Pearce, T. Gamblin, B. R. De-supinski, M. Schulz, and N. M. Amato, Quantifying the effectiveness of load balance algorithms, International Conference on Supercomputing (ICS), pp.185-194, 2012.

R. Graham, E. Lawler, J. Lenstra, and A. Kan, Optimization and approximation in deterministic sequencing and scheduling: a survey, Discrete Optimization II, ser. Annals of Discrete Mathematics, vol.5, pp.287-326, 1979.

U. V. Catalyurek, E. G. Boman, K. D. Devine, D. Bozdag, R. T. Heaphy et al., Hypergraph-based dynamic load balancing for adaptive scientific computations, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), 2007.

A. Bhatele, S. Fourestier, H. Menon, L. V. Kalé, and F. Pellegrini, Applying graph partitioning methods in measurement-based dynamic load balancing, 2012.

L. L. Pilla, C. P. Ribeiro, D. Cordeiro, C. Mei, A. Bhatele et al., A hierarchical approach for load balancing on parallel multi-core systems, Proceedings of International Conference on Parallel Processing (ICPP), pp.118-127, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00788012

M. Deveci, K. Kaya, B. Uçar, and U. V. Catalyurek, Fast and high quality topology-aware task mapping, Proceedings of International Parallel and Distributed Processing Symposium (IPDPS), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01159677

E. Jeannot, E. Meneses, G. Mercier, F. Tessier, and G. Zheng, Communication and topology-aware load balancing in charm++ with treematch, International Conference on Cluster Computing (CLUSTER), pp.1-8, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00851148

M. H. Willebeek-lemair and A. P. Reeves, Strategies for dynamic load balancing on highly parallel computers, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol.4, issue.9, 1993.

H. Menon and L. Kalé, A distributed dynamic load balancer for iterative applications, Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

U. Denver, , vol.15, p.11, 2013.

V. Freitas, A. Santana, M. Castro, and L. L. Pilla, A batch task migration approach for decentralized global rescheduling, Proceedings of International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp.49-56, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01860626

R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999.

J. Yang and Q. He, Scheduling parallel computations by work stealing: A survey, International Journal of Parallel Programming (IJPP), vol.46, issue.2, pp.173-197, 2018.

P. Berenbrink, T. Friedetzky, L. A. Goldberg, P. W. Goldberg, Z. Hu et al., Distributed selfish load balancing, SIAM Journal on Computing, vol.37, issue.4, pp.1163-1181, 2007.

P. Berenbrink, T. Friedetzky, D. Kaaser, and P. Kling, Tight & simple load balancing, Proceedings of International Conference on Parallel and Distributed Computing (IPDPS), pp.718-726, 2019.

M. Lieber, K. Gössner, and W. E. Nagel, The potential of diffusive load balancing at large scale, Proceedings of European MPI Users' Group Meeting (EuroMPI), pp.154-157, 2016.

V. Janjic and K. Hammond, How to be a successful thief, Proceedings of European Conference on Parallel Processing

Q. Chen and M. Guo, Contention and locality-aware workstealing for iterative applications in multi-socket computers, IEEE Transactions on Computers, vol.67, issue.6, pp.784-798, 2018.

R. Da-rosa-righi, R. De-quadros, V. F. Gomes, C. A. Rodrigues, A. M. Da-costa et al., Migpf: Towards on self-organizing process rescheduling of bulksynchronous parallel applications, Future Generation Computer Systems, vol.78, pp.272-286, 2018.

J. Lifflander, S. Krishnamoorthy, and L. V. Kalé, Work Stealing and Persistence-based Load Balancers for Iterative Overdecomposed Applications, Proceedings of International Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp.137-148, 2012.

W. Lee, E. Slaughter, M. Bauer, S. Treichler, T. Warszawski et al., Dynamic tracing: Memoization of task graphs for dynamic task-based runtimes, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, ser. SC, vol.34, pp.1-34, 2018.

C. F. Joerg and B. C. Kuszmaul, Massively parallel chess, Proceedings of the DIMACS Parallel Implementation Challenge, 1994.

N. Gast and G. Bruno, A mean field model of work stealing in large-scale systems, ACM SIGMETRICS Performance Evaluation Review (PER), vol.38, issue.1, pp.13-24, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00788862

G. Zheng, A. Bhatelé, E. Meneses, and L. V. Kalé, Periodic hierarchical load balancing for large supercomputers, International Journal of High Performance Computing Applications (IJHPCA), vol.25, pp.371-385, 2011.

C. Chevalier and F. Pellegrini, Pt-scotch: A tool for efficient parallel graph ordering, Parallel computing, vol.34, issue.6-8, pp.318-331, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00402893

D. Lasalle and G. Karypis, Multi-threaded graph partitioning, Proceedings of International Symposium on Parallel and Distributed Processing (IPDPS), pp.225-236, 2013.

M. Diener, S. White, L. V. Kalé, M. Campbell, D. J. Bodony et al., Improving the memory access locality of hybrid MPI applications, Proceedings of European MPI Users' Group Meeting (EuroMPI), vol.11, pp.1-11, 2017.

D. Unat, A. Dubey, T. Hoefler, J. Shalf, M. Abraham et al., Trends in data locality abstractions for HPC systems, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol.28, issue.10, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01621371

L. L. Pilla, C. P. Ribeiro, P. Coucheney, F. Broquedis, B. Gaujal et al., A topology-aware load balancing algorithm for clustered hierarchical multi-core machines, Future Generation Computer Systems (FGCS), vol.30, pp.191-201, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00953132

P. H. Penna, A. T. Gomes, M. Castro, P. D. Plentz, H. C. Freitas et al., A comprehensive performance evaluation of the BinLPT workload-aware loop scheduler, Concurrency and Computation: Practice and Experience, p.5170, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01986361

K. Li, M. Malawski, and J. Nabrzyski, Reducing fragmentation on 3d torus-based hpc systems using packing-based job scheduling and job placement reconfiguration, Proceedings of International Symposium on Parallel and Distributed Computing, pp.34-43, 2017.

J. Lifflander, S. Krishnamoorthy, and L. V. Kale, Optimizing data locality for fork/join programs using constrained work stealing, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp.857-868, 2014.

M. Tchiboukdjian, V. Danjean, T. Gautier, F. L. Mentec, and B. Raffin, A work stealing scheduler for parallel loops on shared cache multicores, Proceedings of European Conference on Parallel Processing Workshops (EuroParW), pp.99-107, 2010.

S. Shiina and K. Taura, Almost deterministic work stealing, Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2019.

A. D. Kshemkalyani and M. Singhal, Distributed computing: principles, algorithms, and systems, 2011.

P. Brucker, Scheduling Algorithms, 2001.

A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson et al., Epidemic algorithms for replicated database maintenance, Proceedings of Symposium on Principles of Distributed Computing (PODC), 1987.

M. P. Wellman and W. E. Walsh, Distributed quiescence detection in multiagent negotiation, Proceedings International Conference on MultiAgent Systems, pp.317-324, 2000.

B. Acun, A. Gupta, N. Jain, A. Langer, H. Menon et al., Parallel Programming with Migratable Objects: Charm++ in Practice, Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2014.

P. Thoman, K. Dichev, T. Heller, R. Iakymchuk, X. Aguilar et al., A taxonomy of task-based parallel programming technologies for high-performance computing, Springer Journal of Supercomputing, vol.74, issue.4, pp.1422-1434, 2018.

J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kale, Namd: Biomolecular simulation on thousands of processors, SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pp.36-36, 2002.

P. Berenbrink, R. Klasing, A. Kosowski, F. Mallmann-trenn, and P. Uzna?ski, Improved analysis of deterministic load-balancing schemes, ACM Trans. Algorithms (TALG), vol.15, issue.1, pp.1-10, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01251847