Cray X-MP : The Birth of a Supercomputer, Computer, vol.22, issue.1, pp.45-52, 1989. ,
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro- Par '09, pp.851-862, 2009. ,
Cache index-aware memory allocation, ACM SIGPLAN Notices, vol.46, issue.11, pp.55-64, 2011. ,
DOI : 10.1145/2076022.1993486
Increasing memory density by using KSM, OLS '09 : Proceedings of the Linux Symposium, pp.19-28, 2009. ,
An analytical cache model, ACM Transactions on Computer Systems, vol.7, issue.2, pp.184-215, 1989. ,
DOI : 10.1145/63404.63407
Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967. ,
Euro- Par 2008 Workshops -Parallel Processing. chapter A Unified Runtime System for Heterogeneous Multi-core Architectures, pp.174-183 ,
Dynamic hardware-assisted softwarecontrolled page placement to manage capacity allocation and sharing within large caches, HPCA, 2009. ,
Magazines and Vmem : Extending the Slab Allocator to Many CPUs and Arbitrary Resources, Proceedings of the General Track : 2002 USENIX Annual Technical Conference, pp.15-33, 2001. ,
Compiler-directed page coloring for multiprocessors, ACM SIGOPS Operating Systems Review, vol.30, issue.5, pp.244-255, 1996. ,
DOI : 10.1145/248208.237195
Cluster computing: the commodity supercomputer, Software: Practice and Experience, vol.25, issue.6, pp.551-576, 1999. ,
DOI : 10.1002/(SICI)1097-024X(199905)29:6<551::AID-SPE248>3.0.CO;2-C
The nas parallel benchmarks, 1991. ,
The multics virtual memory, Proceedings of the second symposium on Operating systems principles , SOSP '69, pp.30-42, 1969. ,
DOI : 10.1145/961053.961069
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Performance Tuning of x86 OpenMP Codes with MA- QAO, Tools for High Performance Computing, pp.95-113, 2009. ,
Entering the petaflop era: The architecture and performance of Roadrunner, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-1, 2008. ,
DOI : 10.1109/SC.2008.5217926
Test Driven Development : By Example, 2003. ,
Memory management for high-performance applications, 2002. ,
The MOSIX Distributed Operating System : Load Balancing for UNIX, 1993. ,
DOI : 10.1007/3-540-56663-5
mm : Memory Power Management, 2013. ,
An empirical study of the effects of careful page placement in Linux, Proceedings of the 36th annual Southeast regional conference on , ACM-SE 36, 1998. ,
DOI : 10.1145/275295.275365
Avoiding conflict misses dynamically in large direct-mapped caches Hoard : a scalable memory allocator for multithreaded applications Design and Evaluation of Nemesis : a Scalable , Low-Latency, Message-Passing Communication Subsystem, Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, pp.158-170117, 1994. ,
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, vol.33, issue.9, pp.634-644, 2007. ,
DOI : 10.1016/j.parco.2007.06.003
URL : https://hal.archives-ouvertes.fr/hal-00344327
Mostly concurrent garbage collection revisited, ACM SIGPLAN Notices, vol.38, issue.11, pp.255-268, 2003. ,
DOI : 10.1145/949343.949328
Scratchpad memory : design alternative for cache on-chip memory in embedded systems Using lifetime predictors to improve memory allocation performance Grid'5000 : A Large Scale and Highly Reconfigurable Grid Experimental Testbed, Proceedings of the tenth international symposium on Hardware/software codesign, CODES '02 ACM. [Buc07] Ian Buck. GPU Computing : Programming a Massively Parallel Processor Proceedings of the International Symposium on Code Generation and Optimization, CGO '07 Proceedings of the ACM SIG- PLAN 1993 conference on Programming language design and implementation, PLDI '93 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, GRID '05CD97] Michel Cekleov and Michel Dubois. Virtual- Address Caches Part 1 : Problems and Solutions in UniprocessorsCHL99] Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious structure layout Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation , PLDI '99, pp.73-78, 1993. ,
Scalable address spaces using RCU balanced trees Improving Load/Store Queues Usage in Scientific Computing Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC, Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII (2012) Proceedings ICPP 2004. [CPJ10] Patrick Carribault, Marc Pérache, and Hervé Jourdren Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP : accelerators, Tasking and more, pp.199-210, 2010. ,
Gernot Heiser, and New South Wales. Itanium Page Tables and TLB, 2003. ,
A perspective on today???s scaling challenges and possible future directions, Solid-State Electronics, vol.51, issue.4, pp.518-525, 2007. ,
DOI : 10.1016/j.sse.2007.02.004
Virtual memory , processes, and sharing in MULTICS High-order dimensionally split Lagrange-remap schemes for compressible hydrodynamics, Commun . ACM Comptes Rendus Mathematique, vol.11, issue.34812, pp.306-312105, 1968. ,
Virtual Memory, ACM Comput. Surv, vol.2, issue.3, pp.153-189, 1970. ,
Mostly lock-free malloc, Proceedings of the 3rd international symposium on Memory management, ISMM '02, pp.163-174, 2002. ,
Design of ionimplanted MOSFETs with very small physical dimensions, IEEE Journal of Solid-state Circuits, p.98, 1974. ,
Introduction to Real-Time Imaging, 1995. ,
The LINPACK Benchmark: An explanation, Proceedings of the 1st International Conference on Supercomputing, pp.456-474, 1988. ,
DOI : 10.1007/3-540-18991-2_27
A Generational Mostly-concurrent Garbage Collector, 2000. ,
What Every Programmer Should Know About Memory, 2007. ,
An Analysis of SMP Memory Allocators: MapReduce on Large Shared-Memory Systems, 2012 41st International Conference on Parallel Processing Workshops, pp.48-54, 2012. ,
DOI : 10.1109/ICPPW.2012.10
Power Limitations and Dark Silicon Challenge the Future of Multicore, ACM Transactions on Computer Systems, vol.30, issue.3, pp.1-1127, 2012. ,
DOI : 10.1145/2324876.2324879
Intel Paragon XP/S -Architecture and Software Enviroment, Anwendungen, Architekturen, Trends, Seminar, Supercomputer '93, pp.121-141, 1993. ,
Parallel Computing on PC Clusters ??? An Alternative to Supercomputers for Industrial Applications, Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.493-498, 1999. ,
DOI : 10.1007/3-540-48158-3_61
A Scalable Concurrent malloc(3) Implementation for FreeBSD, 2006. ,
Very high-speed computing systems, Proceedings of the IEEE, vol.54, issue.12, pp.1901-1909, 1966. ,
A parallel virtual machine for efficient scheme compilation, Proceedings of the 1990 ACM conference on LISP and functional programming , LFP '90, pp.119-130, 1990. ,
DOI : 10.1145/91556.91606
Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System System design of a computer for time sharing applications fall joint computer conference, part I, AFIPS '65 (Fall, part I), Proceedings of the 11th International Symposium on Parallel Processing, IPPS '97 Proceedings of the November 30? December 1 ACM. [GFLMR13] Thierry Gautier XKaapi : A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp.233-241, 1965. ,
Performance Characteristics of Explicit Superpage Support, Proceedings of the 2010 international conference on Computer Architecture, ISCA'10, pp.293-310, 2012. ,
DOI : 10.1007/978-3-642-24322-6_24
URL : https://hal.archives-ouvertes.fr/inria-00493770
Understanding the Linux Virtual Memory Manager, Virtualization with KVM. Linux J, issue.166, 2004. ,
CAMA : A Predictable Cache- Aware Memory Allocator, Proceedings of the 2011 23rd Euromicro Conference on Real-Time Systems, pp.23-32, 1973. ,
SPEC CPU2006 benchmark descriptions, Proceedings of WOSP/SI- PEW 2010, pp.1-17, 2006. ,
DOI : 10.1145/1186736.1186737
Scalable support for multithreaded applications on dynamic binary instrumentation systems, Proceedings of the 2009 international symposium on Memory management, ISMM '09, pp.20-29, 2009. ,
DOI : 10.1145/1542431.1542435
Computer Architecture, Fourth Edition : A Quantitative Approach [Int10a] Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual Volume 3A : System Programming Guide The OpenMP Implementation of NAS Parallel Benchmarks and its Performance, Int10b] Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual, 1999. ,
HERA: A Hydrodynamic AMR Platform for Multi-Physics Simulations, Adaptive Mesh Refinement -Theory and Applications, pp.283-294 ,
DOI : 10.1007/3-540-27039-6_19
The memory fragmentation problem : solved ?, Proceedings of the 1st international symposium on Memory management, ISMM '98, pp.26-36, 1998. ,
NUMA aware heap memory manager (AMD) ,
Page placement algorithms for large real-indexed caches, ACM Transactions on Computer Systems, vol.10, issue.4, pp.338-359, 1992. ,
DOI : 10.1145/138873.138876
Partial Array Self-refresh in Linux, 2010. ,
"MAMA!", Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.178-186, 2006. ,
DOI : 10.1145/1122971.1122999
VAXcluster : a closelycoupled distributed system, ACM Trans. Comput. Syst, vol.4, issue.2, pp.130-146, 1986. ,
A fast storage allocator, Commun. ACM, vol.8, issue.10, pp.623-624, 1965. ,
Virtual Memory Support for Multiple Pages, 1993. ,
SysBench : a system performance benchmark ,
The Atlas supervisor, eastern joint computer conference : computers -key to total systems control, Proceedings of the, pp.61-279, 1961. ,
Multilayer cache partitioning for multiprogram workloads. Euro- Par'11, 2011. ,
The effect of page allocation on caches, Proceedings of the 25th annual international symposium on Microarchitecture, pp.222-225, 1992. ,
Flikker, ACM SIGARCH Computer Architecture News, vol.39, issue.1, pp.213-224, 2011. ,
DOI : 10.1145/1961295.1950391
The cache performance and optimizations of blocked algorithms, ACM SIGPLAN Notices, vol.26, issue.4, pp.63-74, 1991. ,
DOI : 10.1145/106973.106981
Recursive functions symbolic expressions and their computation by machine, Part I, Communications of the ACM, vol.3, issue.4, pp.184-195, 1960. ,
DOI : 10.1145/367177.367199
On dynamic program relocation ,
Solaris Internals, 2006. ,
Cramming More Components onto Integrated Circuits, Electronics, vol.38, issue.8, pp.114-117, 1965. ,
Progress in digital integrated electronics, Electron Devices Meeting, pp.11-13, 1975. ,
MPI : A Message-Passing Interface, 1994. ,
Practical, transparent operating system support for superpages, Proceedings of the 5th symposium on Operating systems design and implementation , OSDI '02, pp.89-104, 2002. ,
The ghost in the machine, Proceedings of the 2007 ACM/IEEE conference on Supercomputing , SC '07, pp.1-29, 2007. ,
DOI : 10.1145/1362622.1362662
Valgrind, ACM SIGPLAN Notices, vol.42, issue.6, pp.89-100, 2007. ,
DOI : 10.1145/1273442.1250746
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.94-103, 2009. ,
DOI : 10.1007/978-3-642-03770-2_16
MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Proceedings of the 14th international Euro-Par conference on Parallel Processing , Euro-Par '08, pp.78-88, 2008. ,
DOI : 10.1007/978-3-540-85451-7_9
Buddy systems, Commun. ACM, vol.20, issue.6, pp.421-431, 1977. ,
Controlling cache utilization of HPC applications, Proceedings of the international conference on Supercomputing, ICS '11, pp.295-304, 2011. ,
DOI : 10.1145/1995896.1995942
Linux scalability for large NUMA systems, 2003. ,
Instruction-Level Parallel Processing : History, Overview and Perspective, 1992. ,
A very fast algorithm for RAM compression, ACM SIGOPS Operating Systems Review, vol.31, issue.2, pp.36-45, 1997. ,
DOI : 10.1145/250007.250012
Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware, 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI, pp.255-266, 1994. ,
Optimizing matrix transpose in cuda, 2009. ,
Computing Services for LHC : From Clusters to Grids The Frontiers Collection, From the Web to the Grid and Beyond, pp.69-89, 2012. ,
Windows Internals : Including Windows Server 2008 and Windows Vista, Fifth Edition, 2009. ,
The Cray-1 Computer System, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978. ,
Reducing cache misses using hardware and software page placement, Proceedings of the 13th international conference on Supercomputing , ICS '99, pp.155-164, 1999. ,
DOI : 10.1145/305138.305189
TCMalloc : Thread-Caching Malloc ,
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, pp.66-73, 2010. ,
DOI : 10.1109/MCSE.2010.69
Making a case for a Green500 list, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006. ,
DOI : 10.1109/IPDPS.2006.1639600
TLB Update-Hint : A Scalable TLB Consistency Algorithm for Cache-Coherent Non-uniform Memory Access Multiprocessors, IEICE Transactions, issue.7, pp.87-1682, 2004. ,
Optimisation de l'utilisation des caches L2 ,
Implementing distributed shared memory on top of MPI: the DSMPI library, Proceedings of 4th Euromicro Workshop on Parallel and Distributed Processing, p.50, 1996. ,
DOI : 10.1109/EMPDP.1996.500568
PVM : a framework for parallel distributed computing, Concurrency : Pract. Exper, vol.2, issue.4, pp.315-339, 1990. ,
Segregating heap objects by reference behavior and lifetime, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pp.12-23, 1998. ,
Structured Computer Organization, 2005. ,
The Optimist, the Pessimist, and the Global Race to Exascale in 20 Megawatts, Computer, vol.45, issue.1, pp.95-97, 2012. ,
DOI : 10.1109/MC.2012.34
Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.366-377, 2012. ,
DOI : 10.1109/IPDPS.2012.42
Surpassing the TLB performance of superpages with less operating system support, ACM SIGOPS Operating Systems Review, vol.28, issue.5, pp.171-182, 1994. ,
DOI : 10.1145/381792.195531
The CDC 6600 Project, IEEE Annals of the History of Computing, vol.2, issue.4, pp.338-348, 1980. ,
DOI : 10.1109/MAHC.1980.10044
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
Operating system support for improving data locality on CC-NUMA compute servers, Top500. Top 500 Supercomputer Sites, pp.279-289, 1996. ,
DOI : 10.1145/248209.237205
Introducing Kernel-Level Page Reuse for High Performance Computing, MSPC '13, 2013. ,
Memory resource management in VMware ESX server, Proceedings of the 5th symposium on Operating systems design and implementation, OSDI '02, pp.181-194, 2002. ,
Transparent Large-Page Support for Itanium Linux, 2008. ,
Dynamic Storage Allocation : A Survey and Critical Review, Proceedings of the International Workshop on Memory Management , IWMM '95, pp.1-116, 1995. ,
Hitting the memory wall : implications of the obvious, SIGARCH Comput. Archit. News, vol.23, issue.1, pp.20-24, 1995. ,
Analyse mathématique et numérique du système de la magnétohydrodynamique résistive avec termes de champ magnétique auto- généré ,
Why nothing matters : the impact of zeroing, Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, pp.307-324, 2011. ,
Haris Lekatsas, and Srimat Chakradhar. High-performance operating system controlled online memory compression ,
Performance and Scalability Evaluation of ???Big Memory??? on Blue Gene Linux, The International Journal of High Performance Computing Applications, vol.52, issue.1, pp.148-160, 2011. ,
DOI : 10.1145/1693453.1693477
Evaluating the Effect of Huge Page on Large Scale Applications, 2009 IEEE International Conference on Networking, Architecture, and Storage, pp.74-81, 2009. ,
DOI : 10.1109/NAS.2009.18
Complémenté sur l'interférence des mécanismes d'allocations Zang en, 2009. ,
Alignements : 16, 0, 48, p.32 ,
Page Directory Entry 2 ,