K. E. Batcher, Sorting networks and their applications, Proceedings of the April 30--May 2, 1968, spring joint computer conference on, AFIPS '68 (Spring), pp.307-314, 1968.
DOI : 10.1145/1468075.1468121

J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris et al., Extending OpenMP for NUMA machines

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: An efficient multithreaded runtime system, Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.207-216, 1995.

R. D. Blumofe and C. E. Leiserson, Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999.
DOI : 10.1145/324133.324234

F. Broquedis, O. Aumage, B. Goglin, S. Thibault, P. Wacrenier et al., Structuring the execution of OpenMP applications for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10
DOI : 10.1109/IPDPS.2010.5470442
URL : https://hal.archives-ouvertes.fr/inria-00441472

R. Wacrenier and . Namyst, ForestGOMP: An efficient OpenMP environment for NUMA architectures, International Journal of Parallel Programming, vol.38, issue.5, pp.418-439, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00496295

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World, IWOMP, pp.102-115, 2012.
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253

V. Cavé, J. Zhao, J. Shirako, and V. Sarkar, Habanero-Java, Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, pp.51-61, 2011.
DOI : 10.1145/2093157.2093165

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra et al., X10: An object-oriented approach to non-uniform cluster computing, Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pp.519-538, 2005.

Q. Chen, M. Guo, and H. Guan, LAWS, Proceedings of the 28th ACM international conference on Supercomputing, ICS '14, pp.3-12, 2014.
DOI : 10.1145/2597652.2597665
URL : https://hal.archives-ouvertes.fr/hal-01312334

J. Corbet, NUMA in a hurry, 2012.

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Traffic management: A holistic approach to memory placement on NUMA systems, Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp.381-394, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00945758

A. Drebes, A. Pop, K. Heydemann, A. Cohen, and N. Drach, Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages, ACM Transactions on Architecture and Code Optimization, vol.11, issue.3, pp.1-3025, 2014.
DOI : 10.1145/2641764
URL : https://hal.archives-ouvertes.fr/hal-01136491

P. J. Drongowski, Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors Advanced Micro Devices, 2007.

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007.
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474

B. Goglin and N. Furmento, Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-9, 2009.
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172

S. Kaestle, R. Achermann, T. Roscoe, and T. Harris, Shoal: Smart allocation and replication of memory for parallel programs, Proceedings of the 2015 Usenix Annual Technical Conference, USENIX ATC '15, pp.263-276, 2015.

M. Kong, A. Pop, L. Pouchet, R. Govindarajan, A. Cohen et al., Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs, ACM Transactions on Architecture and Code Optimization, vol.11, issue.4, pp.611-6130, 2015.
DOI : 10.1145/2687652

N. M. Lê, A. Pop, A. Cohen, and F. Z. Nardelli, Correct and efficient work-stealing for weak memory models, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, 2013.

B. Lepers, V. Quéma, and A. Fedorova, Thread and memory placement on NUMA systems: Asymmetry matters, 2015 USENIX Annual Technical Conference, USENIX ATC '15, pp.277-289, 2015.

H. Löf and S. Holmgren, Affinity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System, Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pp.387-392, 2005.

Z. Majo and T. R. Gross, A library for portable and composable data locality optimizations for numa systems, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.227-238, 2015.

D. S. Nikolopoulos, E. Artiaga, E. Ayguadé, and J. Labarta, Exploiting memory affinity in OpenMP through schedule reuse, ACM SIGARCH Computer Architecture News, vol.29, issue.5, pp.49-55, 2001.
DOI : 10.1145/563647.563657

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

A. Pop and A. Cohen, OpenStream, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5325, 2013.
DOI : 10.1145/2400682.2400712
URL : https://hal.archives-ouvertes.fr/hal-00786675

C. , P. Ribeiro, and J. Méhaut, Minas: Memory Affinity Management Framework, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00421546

P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos, A programming model for deterministic task parallelism, Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC '11, pp.7-12, 2011.
DOI : 10.1145/1988915.1988918

C. P. Ribeiro, J. Mehaut, A. Carissimi, M. Castro, and L. G. Fernandes, Memory Affinity for Hierarchical Shared Memory Multiprocessors, 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp.59-66, 2009.
DOI : 10.1109/SBAC-PAD.2009.16
URL : https://hal.archives-ouvertes.fr/hal-00788914

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK Users' Guide ? QUeueing And Runtime for Kernels, 2011.