Sorting networks and their applications, Proceedings of the April 30--May 2, 1968, spring joint computer conference on, AFIPS '68 (Spring), pp.307-314, 1968. ,
DOI : 10.1145/1468075.1468121
Extending OpenMP for NUMA machines ,
Cilk: An efficient multithreaded runtime system, Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.207-216, 1995. ,
Scheduling multithreaded computations by work stealing, Journal of the ACM, vol.46, issue.5, pp.720-748, 1999. ,
DOI : 10.1145/324133.324234
Structuring the execution of OpenMP applications for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10 ,
DOI : 10.1109/IPDPS.2010.5470442
URL : https://hal.archives-ouvertes.fr/inria-00441472
ForestGOMP: An efficient OpenMP environment for NUMA architectures, International Journal of Parallel Programming, vol.38, issue.5, pp.418-439, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00496295
libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World, IWOMP, pp.102-115, 2012. ,
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253
Habanero-Java, Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, pp.51-61, 2011. ,
DOI : 10.1145/2093157.2093165
X10: An object-oriented approach to non-uniform cluster computing, Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '05, pp.519-538, 2005. ,
LAWS, Proceedings of the 28th ACM international conference on Supercomputing, ICS '14, pp.3-12, 2014. ,
DOI : 10.1145/2597652.2597665
URL : https://hal.archives-ouvertes.fr/hal-01312334
NUMA in a hurry, 2012. ,
Traffic management: A holistic approach to memory placement on NUMA systems, Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pp.381-394, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00945758
Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages, ACM Transactions on Architecture and Code Optimization, vol.11, issue.3, pp.1-3025, 2014. ,
DOI : 10.1145/2641764
URL : https://hal.archives-ouvertes.fr/hal-01136491
Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors Advanced Micro Devices, 2007. ,
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, pp.1-9, 2009. ,
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172
Shoal: Smart allocation and replication of memory for parallel programs, Proceedings of the 2015 Usenix Annual Technical Conference, USENIX ATC '15, pp.263-276, 2015. ,
Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs, ACM Transactions on Architecture and Code Optimization, vol.11, issue.4, pp.611-6130, 2015. ,
DOI : 10.1145/2687652
Correct and efficient work-stealing for weak memory models, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, 2013. ,
Thread and memory placement on NUMA systems: Asymmetry matters, 2015 USENIX Annual Technical Conference, USENIX ATC '15, pp.277-289, 2015. ,
Affinity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System, Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pp.387-392, 2005. ,
A library for portable and composable data locality optimizations for numa systems, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.227-238, 2015. ,
Exploiting memory affinity in OpenMP through schedule reuse, ACM SIGARCH Computer Architecture News, vol.29, issue.5, pp.49-55, 2001. ,
DOI : 10.1145/563647.563657
Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009. ,
DOI : 10.1177/1094342009106195
OpenStream, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5325, 2013. ,
DOI : 10.1145/2400682.2400712
URL : https://hal.archives-ouvertes.fr/hal-00786675
Minas: Memory Affinity Management Framework, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00421546
A programming model for deterministic task parallelism, Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC '11, pp.7-12, 2011. ,
DOI : 10.1145/1988915.1988918
Memory Affinity for Hierarchical Shared Memory Multiprocessors, 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp.59-66, 2009. ,
DOI : 10.1109/SBAC-PAD.2009.16
URL : https://hal.archives-ouvertes.fr/hal-00788914
QUARK Users' Guide ? QUeueing And Runtime for Kernels, 2011. ,