A. J. Hsiung and . Schiffleger, Cray X-MP : The Birth of a Supercomputer, Computer, vol.22, issue.1, pp.45-52, 1989.

J. Igual, R. Labarta, E. S. Mayo, and . Quintana-ortí, An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro- Par '09, pp.851-862, 2009.

[. Afek, D. Dice, and A. Morrison, Cache index-aware memory allocation, ACM SIGPLAN Notices, vol.46, issue.11, pp.55-64, 2011.
DOI : 10.1145/2076022.1993486

[. Arcangeli, I. Eidus, and C. Wright, Increasing memory density by using KSM, OLS '09 : Proceedings of the Linux Symposium, pp.19-28, 2009.

J. [. Agarwal, M. Hennessy, and . Horowitz, An analytical cache model, ACM Transactions on Computer Systems, vol.7, issue.2, pp.184-215, 1989.
DOI : 10.1145/63404.63407

G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, spring joint computer conference, AFIPS '67 (Spring), Proceedings of the, pp.483-485, 1967.

[. Augonnet and R. Namyst, Euro- Par 2008 Workshops -Parallel Processing. chapter A Unified Runtime System for Heterogeneous Multi-core Architectures, pp.174-183

K. [. Awasthi, R. Sudan, J. Balasubramonian, and . Carter, Dynamic hardware-assisted softwarecontrolled page placement to manage capacity allocation and sharing within large caches, HPCA, 2009.

J. Bonwick and J. Adams, Magazines and Vmem : Extending the Slab Allocator to Many CPUs and Arbitrary Resources, Proceedings of the General Track : 2002 USENIX Annual Technical Conference, pp.15-33, 2001.

E. Bugnion, J. M. Anderson, T. C. Mowry, M. Rosenblum, and M. S. Lam, Compiler-directed page coloring for multiprocessors, ACM SIGOPS Operating Systems Review, vol.30, issue.5, pp.244-255, 1996.
DOI : 10.1145/248208.237195

M. Baker and R. Buyya, Cluster computing: the commodity supercomputer, Software: Practice and Experience, vol.25, issue.6, pp.551-576, 1999.
DOI : 10.1002/(SICI)1097-024X(199905)29:6<551::AID-SPE248>3.0.CO;2-C

. H. Bbb-+-91-]-d, E. Bailey, J. T. Barszcz, D. S. Barton, R. L. Browning et al., The nas parallel benchmarks, 1991.

C. [. Bensoussan, R. C. Clingen, and . Daley, The multics virtual memory, Proceedings of the second symposium on Operating systems principles , SOSP '69, pp.30-42, 1969.
DOI : 10.1145/961053.961069

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

D. Barthou, A. C. Rubial, W. Jalby, S. Koliai, and C. Valensi, Performance Tuning of x86 OpenMP Codes with MA- QAO, Tools for High Performance Computing, pp.95-113, 2009.

K. J. Barker, K. Davis, A. Hoisie, D. J. Kerbyson, M. Lang et al., Entering the petaflop era: The architecture and performance of Roadrunner, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-1, 2008.
DOI : 10.1109/SC.2008.5217926

]. K. Bec03 and . Beck, Test Driven Development : By Example, 2003.

[. Berger, Memory management for high-performance applications, 2002.

[. Barak, S. Guday, and R. G. Wheeler, The MOSIX Distributed Operating System : Load Balancing for UNIX, 1993.
DOI : 10.1007/3-540-56663-5

S. Srivatsa and . Bhat, mm : Memory Power Management, 2013.

[. Bahadur, V. Kalyanakrishnan, and J. Westall, An empirical study of the effects of careful page placement in Linux, Proceedings of the 36th annual Southeast regional conference on , ACM-SE 36, 1998.
DOI : 10.1145/275295.275365

N. Brian, D. Bershad, T. H. Lee, J. Romer, . Bradley-chen et al., Avoiding conflict misses dynamically in large direct-mapped caches Hoard : a scalable memory allocator for multithreaded applications Design and Evaluation of Nemesis : a Scalable , Low-Latency, Message-Passing Communication Subsystem, Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, pp.158-170117, 1994.

[. Buntinas, G. Mercier, and W. Gropp, Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem, Parallel Computing, vol.33, issue.9, pp.634-644, 2007.
DOI : 10.1016/j.parco.2007.06.003

URL : https://hal.archives-ouvertes.fr/hal-00344327

K. Barabash, Y. Ossia, and E. Petrank, Mostly concurrent garbage collection revisited, ACM SIGPLAN Notices, vol.38, issue.11, pp.255-268, 2003.
DOI : 10.1145/949343.949328

P. Daniel, M. Bovet, S. Banakar, B. Steinke, M. Lee et al., Scratchpad memory : design alternative for cache on-chip memory in embedded systems Using lifetime predictors to improve memory allocation performance Grid'5000 : A Large Scale and Highly Reconfigurable Grid Experimental Testbed, Proceedings of the tenth international symposium on Hardware/software codesign, CODES '02 ACM. [Buc07] Ian Buck. GPU Computing : Programming a Massively Parallel Processor Proceedings of the International Symposium on Code Generation and Optimization, CGO '07 Proceedings of the ACM SIG- PLAN 1993 conference on Programming language design and implementation, PLDI '93 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, GRID '05CD97] Michel Cekleov and Michel Dubois. Virtual- Address Caches Part 1 : Problems and Solutions in UniprocessorsCHL99] Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious structure layout Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation , PLDI '99, pp.73-78, 1993.

A. T. Clements, M. F. Kaashoek, and N. Zeldovich, Scalable address spaces using RCU balanced trees Improving Load/Store Queues Usage in Scientific Computing Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC, Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII (2012) Proceedings ICPP 2004. [CPJ10] Patrick Carribault, Marc Pérache, and Hervé Jourdren Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP : accelerators, Tasking and more, pp.199-210, 2010.

[. Chapman and I. Wienand, Gernot Heiser, and New South Wales. Itanium Page Tables and TLB, 2003.

R. H. Dennard, J. Cai, and A. Kumar, A perspective on today???s scaling challenges and possible future directions, Solid-State Electronics, vol.51, issue.4, pp.518-525, 2007.
DOI : 10.1016/j.sse.2007.02.004

C. Robert, J. B. Daley, and . Dennis, Virtual memory , processes, and sharing in MULTICS High-order dimensionally split Lagrange-remap schemes for compressible hydrodynamics, Commun . ACM Comptes Rendus Mathematique, vol.11, issue.34812, pp.306-312105, 1968.

J. Peter and . Denning, Virtual Memory, ACM Comput. Surv, vol.2, issue.3, pp.153-189, 1970.

D. Dice and A. Garthwaite, Mostly lock-free malloc, Proceedings of the 3rd international symposium on Memory management, ISMM '02, pp.163-174, 2002.

. H. Dgr-+-74-]-r, F. H. Denard, V. L. Gaensslen, E. Rideout, A. R. Bassous et al., Design of ionimplanted MOSFETs with very small physical dimensions, IEEE Journal of Solid-state Circuits, p.98, 1974.

P. [. Dougherty and . Laplante, Introduction to Real-Time Imaging, 1995.

[. Dongarra, The LINPACK Benchmark: An explanation, Proceedings of the 1st International Conference on Supercomputing, pp.456-474, 1988.
DOI : 10.1007/3-540-18991-2_27

D. Detlefs and T. Printezis, A Generational Mostly-concurrent Garbage Collector, 2000.

[. Drepper, What Every Programmer Should Know About Memory, 2007.

[. Dobbelin, T. Schutt, and A. Reinefeld, An Analysis of SMP Memory Allocators: MapReduce on Large Shared-Memory Systems, 2012 41st International Conference on Parallel Processing Workshops, pp.48-54, 2012.
DOI : 10.1109/ICPPW.2012.10

E. Hadi-esmaeilzadeh, R. Blem, . St, K. Amant, D. Sankaralingam et al., Power Limitations and Dark Silicon Challenge the Future of Multicore, ACM Transactions on Computer Systems, vol.30, issue.3, pp.1-1127, 2012.
DOI : 10.1145/2324876.2324879

[. Esser and R. Knecht, Intel Paragon XP/S -Architecture and Software Enviroment, Anwendungen, Architekturen, Trends, Seminar, Supercomputer '93, pp.121-141, 1993.

[. Eberl, W. Karl, C. Trinitis, and A. Blaszczyk, Parallel Computing on PC Clusters ??? An Alternative to Supercomputers for Industrial Applications, Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.493-498, 1999.
DOI : 10.1007/3-540-48158-3_61

J. Evans, A Scalable Concurrent malloc(3) Implementation for FreeBSD, 2006.

]. M. Fly66 and . Flynn, Very high-speed computing systems, Proceedings of the IEEE, vol.54, issue.12, pp.1901-1909, 1966.

M. Feeley and J. S. Miller, A parallel virtual machine for efficient scheme compilation, Proceedings of the 1990 ACM conference on LISP and functional programming , LFP '90, pp.119-130, 1990.
DOI : 10.1145/91556.91606

Y. Fujii, H. Yasuda, Y. Akashi, M. Inagami, O. Koga et al., Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System System design of a computer for time sharing applications fall joint computer conference, part I, AFIPS '65 (Fall, part I), Proceedings of the 11th International Symposium on Parallel Processing, IPPS '97 Proceedings of the November 30? December 1 ACM. [GFLMR13] Thierry Gautier XKaapi : A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp.233-241, 1965.

M. Gorman and P. Healy, Performance Characteristics of Explicit Superpage Support, Proceedings of the 2010 international conference on Computer Architecture, ISCA'10, pp.293-310, 2012.
DOI : 10.1007/978-3-642-24322-6_24

URL : https://hal.archives-ouvertes.fr/inria-00493770

M. Gorman, Understanding the Linux Virtual Memory Manager, Virtualization with KVM. Linux J, issue.166, 2004.

I. Prentice-hall, P. Herter, F. Backes, J. Haupenthal, and . Reineke, CAMA : A Predictable Cache- Aware Memory Allocator, Proceedings of the 2011 23rd Euromicro Conference on Real-Time Systems, pp.23-32, 1973.

J. L. Henning, SPEC CPU2006 benchmark descriptions, Proceedings of WOSP/SI- PEW 2010, pp.1-17, 2006.
DOI : 10.1145/1186736.1186737

[. Hazelwood, G. Lueck, and R. Cohn, Scalable support for multithreaded applications on dynamic binary instrumentation systems, Proceedings of the 2009 international symposium on Memory management, ISMM '09, pp.20-29, 2009.
DOI : 10.1145/1542431.1542435

L. John, D. A. Hennessy, and . Patterson, Computer Architecture, Fourth Edition : A Quantitative Approach [Int10a] Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual Volume 3A : System Programming Guide The OpenMP Implementation of NAS Parallel Benchmarks and its Performance, Int10b] Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual, 1999.

H. Jourdren and . Hera, HERA: A Hydrodynamic AMR Platform for Multi-Physics Simulations, Adaptive Mesh Refinement -Theory and Applications, pp.283-294
DOI : 10.1007/3-540-27039-6_19

S. Mark, P. R. Johnstone, and . Wilson, The memory fragmentation problem : solved ?, Proceedings of the 1st international symposium on Memory management, ISMM '98, pp.26-36, 1998.

[. Kaminski, NUMA aware heap memory manager (AMD)

M. [. Kessler and . Hill, Page placement algorithms for large real-indexed caches, ACM Transactions on Computer Systems, vol.10, issue.4, pp.338-359, 1992.
DOI : 10.1145/138873.138876

[. Kjellberg, Partial Array Self-refresh in Linux, 2010.

S. Kahan and P. Konecny, "MAMA!", Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '06, pp.178-186, 2006.
DOI : 10.1145/1122971.1122999

P. Nancy, H. M. Kronenberg, W. D. Levy, and . Strecker, VAXcluster : a closelycoupled distributed system, ACM Trans. Comput. Syst, vol.4, issue.2, pp.130-146, 1986.

C. Kenneth and . Knowlton, A fast storage allocator, Commun. ACM, vol.8, issue.10, pp.623-624, 1965.

A. Yousef, M. N. Khalidi, M. Nelson, D. Talluri, and . Williams, Virtual Memory Support for Multiple Pages, 1993.

A. Kopytov, SysBench : a system performance benchmark

R. [. Kilburn, D. J. Payne, and . Howarth, The Atlas supervisor, eastern joint computer conference : computers -key to total systems control, Proceedings of the, pp.61-279, 1961.

[. Kandemir, R. Prabhakar, M. Karakoy, and Y. Zhang, Multilayer cache partitioning for multiprogram workloads. Euro- Par'11, 2011.

W. L. Lynch, B. K. Bray, and M. J. Flynn, The effect of page allocation on caches, Proceedings of the 25th annual international symposium on Microarchitecture, pp.222-225, 1992.

[. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn, Flikker, ACM SIGARCH Computer Architecture News, vol.39, issue.1, pp.213-224, 2011.
DOI : 10.1145/1961295.1950391

M. D. Lam, E. E. Rothberg, and M. E. Wolf, The cache performance and optimizations of blocked algorithms, ACM SIGPLAN Notices, vol.26, issue.4, pp.63-74, 1991.
DOI : 10.1145/106973.106981

J. Mccarthy, Recursive functions symbolic expressions and their computation by machine, Part I, Communications of the ACM, vol.3, issue.4, pp.184-195, 1960.
DOI : 10.1145/367177.367199

]. W. Mcg65 and . Mcgee, On dynamic program relocation

J. Mauro and R. Mcdougall, Solaris Internals, 2006.

]. G. Moo65 and . Moore, Cramming More Components onto Integrated Circuits, Electronics, vol.38, issue.8, pp.114-117, 1965.

E. Gordon and . Moore, Progress in digital integrated electronics, Electron Devices Meeting, pp.11-13, 1975.

M. Forum, MPI : A Message-Passing Interface, 1994.

J. Navarro, S. Iyer, P. Druschel, and A. Cox, Practical, transparent operating system support for superpages, Proceedings of the 5th symposium on Operating systems design and implementation , OSDI '02, pp.89-104, 2002.

A. Nataraj, A. Morris, A. D. Malony, M. Sottile, and P. Beckman, The ghost in the machine, Proceedings of the 2007 ACM/IEEE conference on Supercomputing , SC '07, pp.1-29, 2007.
DOI : 10.1145/1362622.1362662

N. Nethercote and J. Seward, Valgrind, ACM SIGPLAN Notices, vol.42, issue.6, pp.89-100, 2007.
DOI : 10.1145/1273442.1250746

[. Pérache, P. Carribault, and H. Jourdren, MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption, Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.94-103, 2009.
DOI : 10.1007/978-3-642-03770-2_16

M. Pérache, H. Jourdren, and R. Namyst, MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Proceedings of the 14th international Euro-Par conference on Parallel Processing , Euro-Par '08, pp.78-88, 2008.
DOI : 10.1007/978-3-540-85451-7_9

L. James, T. A. Peterson, and . Norman, Buddy systems, Commun. ACM, vol.20, issue.6, pp.421-431, 1977.

[. Perarnau, M. Tchiboukdjian, and G. Huard, Controlling cache utilization of HPC applications, Proceedings of the international conference on Supercomputing, ICS '11, pp.295-304, 2011.
DOI : 10.1145/1995896.1995942

R. [. Howker and . Bryant, Linux scalability for large NUMA systems, 2003.

[. Rau and J. A. Fisher, Instruction-Level Parallel Processing : History, Overview and Perspective, 1992.

L. Rizzo, A very fast algorithm for RAM compression, ACM SIGOPS Operating Systems Review, vol.31, issue.2, pp.36-45, 1997.
DOI : 10.1145/250007.250012

T. Romer, D. Lee, B. N. Bershad, and J. Chen, Dynamic Page Mapping Policies for Cache Conflict Resolution on Standard Hardware, 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI, pp.255-266, 1994.

P. [. Ruetsch and . Micikevicius, Optimizing matrix transpose in cuda, 2009.

[. Robertson, Computing Services for LHC : From Clusters to Grids The Frontiers Collection, From the Web to the Grid and Beyond, pp.69-89, 2012.

M. Russinovich and D. A. Solomon, Windows Internals : Including Windows Server 2008 and Windows Vista, Fifth Edition, 2009.

M. Richard and . Russell, The Cray-1 Computer System, Communications of the ACM, vol.21, issue.1, pp.63-72, 1978.

[. Sherwood, B. Calder, and J. Emer, Reducing cache misses using hardware and software page placement, Proceedings of the 13th international conference on Supercomputing , ICS '99, pp.155-164, 1999.
DOI : 10.1145/305138.305189

[. Ghemawat, TCMalloc : Thread-Caching Malloc

J. E. Stone, D. Gohara, and G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems, Computing in Science & Engineering, vol.12, issue.3, pp.66-73, 2010.
DOI : 10.1109/MCSE.2010.69

[. Sharma, C. Hsu, and . Wu-chun-feng, Making a case for a Green500 list, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006.
DOI : 10.1109/IPDPS.2006.1639600

D. Kim, Y. Roh, K. Ho-park, and D. Park, TLB Update-Hint : A Scalable TLB Consistency Algorithm for Cache-Coherent Non-uniform Memory Access Multiprocessors, IEICE Transactions, issue.7, pp.87-1682, 2004.

S. Valat and M. Pérache, Optimisation de l'utilisation des caches L2

J. [. Silva, S. Silva, and . Chapple, Implementing distributed shared memory on top of MPI: the DSMPI library, Proceedings of 4th Euromicro Workshop on Parallel and Distributed Processing, p.50, 1996.
DOI : 10.1109/EMPDP.1996.500568

]. V. Sun90 and . Sunderam, PVM : a framework for parallel distributed computing, Concurrency : Pract. Exper, vol.2, issue.4, pp.315-339, 1990.

L. Matthew, B. G. Seidl, and . Zorn, Segregating heap objects by reference behavior and lifetime, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pp.12-23, 1998.

A. S. Tanenbaum, Structured Computer Organization, 2005.

K. [. Tolentino and . Cameron, The Optimist, the Pessimist, and the Global Race to Exascale in 20 Megawatts, Computer, vol.45, issue.1, pp.95-97, 2012.
DOI : 10.1109/MC.2012.34

[. Tchiboukdjian, P. Carribault, and M. Perache, Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp.366-377, 2012.
DOI : 10.1109/IPDPS.2012.42

[. Talluri and M. D. Hill, Surpassing the TLB performance of superpages with less operating system support, ACM SIGOPS Operating Systems Review, vol.28, issue.5, pp.171-182, 1994.
DOI : 10.1145/381792.195531

J. E. Thornton, The CDC 6600 Project, IEEE Annals of the History of Computing, vol.2, issue.4, pp.338-348, 1980.
DOI : 10.1109/MAHC.1980.10044

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010.
DOI : 10.1109/ICPPW.2010.38

[. Verghese, S. Devine, A. Gupta, and M. Rosenblum, Operating system support for improving data locality on CC-NUMA compute servers, Top500. Top 500 Supercomputer Sites, pp.279-289, 1996.
DOI : 10.1145/248209.237205

J. Vsw13-]-pérache-marc-valat-sébastien and . William, Introducing Kernel-Level Page Reuse for High Performance Computing, MSPC '13, 2013.

A. Carl and . Waldspurger, Memory resource management in VMware ESX server, Proceedings of the 5th symposium on Operating systems design and implementation, OSDI '02, pp.181-194, 2002.

I. Wienand, Transparent Large-Page Support for Itanium Linux, 2008.

R. Paul, M. S. Wilson, M. Johnstone, D. Neely, and . Boles, Dynamic Storage Allocation : A Survey and Critical Review, Proceedings of the International Workshop on Memory Management , IWMM '95, pp.1-116, 1995.

. A. Wm, S. A. Wulf, and . Mckee, Hitting the memory wall : implications of the obvious, SIGARCH Comput. Archit. News, vol.23, issue.1, pp.20-24, 1995.

M. Wolff, Analyse mathématique et numérique du système de la magnétohydrodynamique résistive avec termes de champ magnétique auto- généré

X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. Mckinley, Why nothing matters : the impact of zeroing, Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, pp.307-324, 2011.

L. Yang and R. P. Dick, Haris Lekatsas, and Srimat Chakradhar. High-performance operating system controlled online memory compression

K. Kazutomo-yoshii, H. Iskra, P. Naik, P. C. Beckman, and . Broekema, Performance and Scalability Evaluation of ???Big Memory??? on Blue Gene Linux, The International Journal of High Performance Computing Applications, vol.52, issue.1, pp.148-160, 2011.
DOI : 10.1145/1693453.1693477

[. Zhang, B. Li, Z. Huo, and D. Meng, Evaluating the Effect of Huge Page on Large Scale Applications, 2009 IEEE International Conference on Networking, Architecture, and Storage, pp.74-81, 2009.
DOI : 10.1109/NAS.2009.18

B. Chapitre, Complémenté sur l'interférence des mécanismes d'allocations Zang en, 2009.

. Opensolaris, Alignements : 16, 0, 48, p.32

N. Barr and *. Bt-h, Page Directory Entry 2