Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport, Proceedings of the International Conference on High Performance C omputing (HiPC), 2006. ,
DOI : 10.1007/11945918_35
Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications, 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004. ,
Efficient parallel programming on scalable shared memory systems with High Performance Fortran, Concurrency: Practice and Experience, pp.789-803, 2002. ,
DOI : 10.1002/cpe.649
On the Importance of Parallel Application Placement in NUMA Multiprocessors, Proceedings of the Fourth Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), 1993. ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Introduction to UPC and Language Specification, 1999. ,
Achieving performance under OpenMP on cc- NUMA and software distributed shared memory systems, Concurrency: Practice and Experience, pp.713-739, 2002. ,
Extending openmp worksharing directives for multithreading, EuroPar'06 Parallel Processing, 2006. ,
HMPP: A hybrid multi-core parallel programming environment, 2007. ,
Extending the openmp tasking model to allow dependant tasks, IWOMP Proceedings, 2008. ,
The Implementation of the Cilk-5 Multithreaded Language, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1998. ,
Enabling high-performance memory migration for multithreaded applications on LINUX, 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009. ,
DOI : 10.1109/IPDPS.2009.5161101
URL : https://hal.archives-ouvertes.fr/inria-00358172
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system, 19th ACM International Conference on Supercomputing, pp.387-392 ,
Memory bandwidth and machine balance in current high performance computers, IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter pp, pp.19-25, 1995. ,
User-level dynamic page migration for multiprogrammed shared-memory multiprocessors, Proceedings 2000 International Conference on Parallel Processing, pp.95-103, 2000. ,
DOI : 10.1109/ICPP.2000.876083
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors, Journal of Parallel and Distributed Computing, vol.62, issue.6, pp.1069-1103, 2002. ,
DOI : 10.1006/jpdc.2001.1817
Geographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers, Second International Workshop on OpenMP, 2006. ,
DOI : 10.1007/978-3-540-68555-5_31
Feedback-directed thread scheduling with memory considerations, Proceedings of the 16th international symposium on High performance distributed computing , HPDC '07, 2007. ,
DOI : 10.1145/1272366.1272380
Using Locality Information in Userlevel Scheduling, 1995. ,
Data and thread affinity in openmp programs, Proceedings of the 2008 workshop on Memory access on future processors a solved problem?, MAW '08, pp.377-384, 2008. ,
DOI : 10.1145/1366219.1366222
Building Portable Thread Schedulers for Hierarchical Multiprocessors: The BubbleSched Framework, Euro-Par, 2007. ,
DOI : 10.1007/978-3-540-74466-5_6
URL : https://hal.archives-ouvertes.fr/inria-00154506
Memory and Thread Placement Effects as a Function of Cache Usage: A Study of the Gaussian Chemistry Code on the SunFire X4600 M2, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008), pp.31-36, 2008. ,
DOI : 10.1109/I-SPAN.2008.13