A. , C. Netzer, and N. , The logit-response dynamics, Games and Economic Behavior, vol.68, issue.2, pp.413-427, 2010.

B. , A. Kale, L. V. Kumar, and S. , Dynamic topology aware load balancing algorithms for molecular dynamics applications, In: CONFERENCE ON SUPERCOMPUT- ING, vol.23, pp.110-116, 2009.

. Blackjack, Compiler Metrics and Evaluation

B. , R. D. Leiserson, and C. , Scheduling multithreaded computations by work stealing, J. ACM, issue.465, pp.720-748, 1999.

C. , T. L. Kuhl, and J. G. , A taxonomy of scheduling in general-purpose distributed computing systems, IEEE Trans. Softw. Eng, issue.14, pp.141-154, 1988.

C. , E. H. Diener, and M. Navaux, Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory, IPDPS), 2012 IEEE 26TH INTERNA- TIONAL. Proceedings. . . [S.l.: s.n.], pp.2012-532

D. , L. Menon, and R. , OpenMP: an industry standard api for shared-memory programming, Computational Science & Engineering IEEE, issue.51, pp.46-55, 2002.

D. , M. Cruz, and E. H. Navaux, Communication-Based Mapping Using Shared Pages, IEEE INTERNATIONAL PARALLEL DISTRIBUTED PROCESS- ING SYMPOSIUM (IPDPS). Proceedings. . . [S.l.: s.n.], 2013.

F. , E. Goldman, A. Mehaut, and J. A. Numa, Aware Runtime Environment for the Actor Model, PROCEEDINGS OF THE 42ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2013 Proceedings . . . [S.l.: s.n.], 2013, pp.250-259
URL : https://hal.archives-ouvertes.fr/hal-00953120

F. , M. Madduri, K. Raghavan, P. Networking, and S. And-analy-sis, NUMA-aware graph mining techniques for performance and energy efficiency, In: INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING Proceedings, p.2012

F. , M. Leiserson, C. E. Randall, and K. H. , The implementation of the Cilk-5 multithreaded language, SIGPLAN Not, issue.335, pp.212-223, 1998.

G. , T. Besseron, X. Pigeon, and L. , Kaapi: a thread scheduling runtime system for data flow computations on cluster of multi-processors, Proceedings. . . ACM, p.23, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00684843

G. , T. Ibarra, O. H. Sahni, and S. , Bounds for LPT schedules on uniform processors, SIAM Journal on Computing, issue.61, pp.155-166, 1977.

G. , W. Lusk, E. Skjellum, and A. , Using MPI: portable parallel programming with the message-passing interface, p.371, 1999.

H. , T. Schneider, and T. , Runtime detection and optimization of collective communication patterns, Proceedings. . . ACM, pp.2012-263

H. , T. Snir, and M. , Generic topology mapping strategies for large-scale parallel architectures, Proceedings. . . ACM, 2011. (ICS '11)

H. , C. Lawlor, O. Kale, and L. , Adaptive mpi. Lecture notes in computer science, 2004.

J. , E. Mercier, G. D. , P. Guarracino, M. Talia et al., Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures Euro- Par 2010 -Parallel Processing, Lecture Notes in Computer Science, pp.199-210, 2010.

K. , L. V. Krishnan, S. Object-oriented, . Pro-gramming, . Systems et al., Charm++: a portable concurrent object oriented system based on c++, In: EIGHTH ANNUAL CONFERENCE ON, 1993.

. Proceedings, [S.l.: s.n, pp.91-108, 1993.

K. , L. V. Sinha, A. Systems, I. Fair, . Parallel et al., Projections: a preliminary performance tool for charm, Proceedings. . . [S.l.: s.n.], pp.108-114, 1993.

K. , G. Kumar, and V. , METIS: unstructured graph partitioning and sparse matrix ordering system. The University of Minnesota, 1995.

K. , C. Burger, D. Keckler, and S. W. , An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches, SIGOPS Oper. Syst. Rev, issue.365, pp.211-222, 2002.

K. , D. Martin, and R. , An unsplit convolutional perfectly matched layer improved at grazing incidence for the seismic wave equation, GEOPHYSICS, issue.725, pp.155-167, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00528418

K. , V. Grama, A. Y. Vempaty, and N. , Scalable load balancing techniques for parallel computers, J. Parallel Distrib. Comput, issue.221, pp.60-79, 1994.

L. , J. Krishnamoorthy, S. Kale, and L. , Work stealing and persistence-based load balancers for iterative overdecomposed applications, HIGH- PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING Proceedings. . . ACM, 2012. (HPDC '12)

L. , H. Holmgren, and S. , affinity-on-next-touch: increasing the performance of an industrial pde solver on a cc-numa system, Proceedings. . . ACMICS '05), pp.387-392, 2005.

M. , G. Clet-ortega, J. Ropo, M. Westerholm, J. Don-garra et al., Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, pp.104-115, 2009.

P. , F. Roman, and J. , Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs, Proceedings. . . [S.l.: s.n.], pp.493-498, 1996.

Q. , J. Wagner, and F. , Hierarchical work-stealing. In: EURO-PAR CONFER- ENCE ON PARALLEL PROCESSING: PART I, 16, Proceedings
URL : https://hal.archives-ouvertes.fr/inria-00429624

. Servet, The Servet Benchmark Suite Homepage

T. , S. Namyst, R. Wacrenier, and P. , Building Portable Thread Schedulers for Hierarchical Multiprocessors: the bubblesched framework
URL : https://hal.archives-ouvertes.fr/inria-00154506

M. Bougé, L. Priol, and T. , Euro-Par 2007 Parallel Processing, Lecture Notes in Computer Science, pp.42-51, 2007.

T. , J. Hager, G. Wellein, and G. , LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments, INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, pp.207-216, 2010.

W. , W. Mckee, and S. A. , Hitting the memory wall: implications of the obvious

X. , M. Droegemeier, K. Weber, and D. , Numerical prediction of high-impact local weather: a driver for petascale computing. Petascale Computing: Algorithms and Applications, pp.103-124, 2007.