Adaptive load-balancing for divide-and-conquer grid applications, J. of Supercomputing, 2004. ,
A single-chip, 1.6- billion, 16-b mac/s multiprocessor dsp. Solid-State Circuits, IEEE Journal, issue.3, pp.35412-424, 2000. ,
An evaluation of directory schemes for cache coherence, ACM SIGARCH Computer Architecture News, vol.16, issue.2, pp.280-298, 1988. ,
DOI : 10.1145/633625.52432
Task-based execution of nested openmp loops, OpenMP in a Heterogeneous World, pp.210-222, 2012. ,
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite, Proceedings of the 42nd annual Southeast regional conference on , ACM-SE 42, pp.267-272, 2004. ,
DOI : 10.1145/986537.986601
al. The fortress language specification, Sun Microsystems, vol.139, p.140, 2005. ,
URL : https://hal.archives-ouvertes.fr/jpa-00217210
Pgas (partitioned global address space) languages, Encyclopedia of Parallel Computing, pp.1539-1545, 2011. ,
The Tera computer system, ACM SIGARCH Computer Architecture News, vol.18, issue.3, pp.1-6, 1990. ,
DOI : 10.1145/255129.255132
Cache coherence protocols: evaluation using a multiprocessor simulation model, ACM Transactions on Computer Systems, vol.4, issue.4, pp.273-298, 1986. ,
DOI : 10.1145/6513.6514
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.8940
Cache coherence protocols: evaluation using a multiprocessor simulation model, ACM Transactions on Computer Systems, vol.4, issue.4, pp.273-298, 1986. ,
DOI : 10.1145/6513.6514
Thread scheduling for multiprogrammed multiprocessors, Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '98, pp.119-129, 1998. ,
On the inclusion properties for multi-level cache hierarchies, Proceedings of the 15th Annual International Symposium on Computer Architecture, ISCA '88, pp.73-80, 1988. ,
Finite element formulations for large deformation dynamic analysis, International Journal for Numerical Methods in Engineering, vol.7, issue.2, pp.353-386, 1975. ,
DOI : 10.1002/nme.1620090207
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.208.5272
A study of replacement algorithms for a virtual-storage computer, IBM Systems Journal, vol.5, issue.2, pp.78-101, 1966. ,
DOI : 10.1147/sj.52.0078
Online Scheduling of Parallel Programs on Heterogeneous Systems with Applications to Cilk, Theory of Computing Systems, vol.35, issue.3, pp.289-304, 2002. ,
DOI : 10.1007/s00224-002-1055-5
A case study comparing aos (arrays of structures) and soa (structures of arrays) data layouts for a compute-intensive loop run on intel xeon processors and intel xeon phi product family coprocessors, 2013. ,
Cilk : An efficient multithreaded runtime system, Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '95, pp.207-216, 1995. ,
DOI : 10.1006/jpdc.1996.0107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175
Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999. ,
On the importance of parallel application placement in numa multiprocessors ,
Algorithm 781: generating Hilbert's space-filling curve by recursion, ACM Transactions on Mathematical Software, vol.24, issue.2, pp.184-189, 1998. ,
DOI : 10.1145/290200.290219
Structuring the execution of OpenMP applications for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
DOI : 10.1109/IPDPS.2010.5470442
URL : https://hal.archives-ouvertes.fr/inria-00441472
Executing functional programs on a virtual tree of processors, Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture, FPCA '81, pp.187-194, 1981. ,
Convergence with hilbert's space filling curve, Journal of Computer and System Sciences, vol.3, issue.2, pp.128-146, 1969. ,
Numaictm : A parallel version of ictm exploiting memory placement strategies for numa machines, Parallel Distributed Processing IEEE International Symposium on, pp.1-8, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00788917
Méthode des éléments finis : Approche pratique en mécanique des structures. Dunod, 2010. ,
LimitLESS directories, ACM SIGPLAN Notices, vol.26, issue.4, pp.224-234, 1991. ,
DOI : 10.1145/106973.106995
Parallel Programmability and the Chapel Language, International Journal of High Performance Computing Applications, vol.21, issue.3, pp.291-312, 2007. ,
DOI : 10.1177/1094342007078442
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.7600
Modélisation du contact : nouvelles approches numériques, These de doctorat, 2002. ,
Parallel Programming in OpenMP, 2001. ,
X10, ACM SIGPLAN Notices, vol.40, issue.10, pp.519-538, 2005. ,
DOI : 10.1145/1103845.1094852
URL : https://hal.archives-ouvertes.fr/in2p3-00166974
Cache-conscious structure definition, Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, pp.13-25, 1999. ,
Parametric analysis of polyhedral iteration spaces Journal of VLSI signal processing systems for signal, image and video technology, pp.179-194, 1998. ,
Upc language specifications v1. 2, 2005. ,
Characterizing and improving the performance of Intel Threading Building Blocks, 2008 IEEE International Symposium on Workload Characterization, pp.57-66, 2008. ,
DOI : 10.1109/IISWC.2008.4636091
Modélisation des éléments finis : Cours et exercices corrigés, 2008. ,
Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969. ,
DOI : 10.1145/800195.805928
Several Strategies for Reducing the Bandwidth of Matrices, Sparse Matrices and their Applications The IBM Research Symposia Series, pp.157-166, 1972. ,
DOI : 10.1007/978-1-4615-8675-3_14
OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998. ,
DOI : 10.1109/99.660313
Partitioned Global Address Space Languages, ACM Computing Surveys, vol.47, issue.4, p.29, 2016. ,
DOI : 10.1145/2716320
URL : https://hal.archives-ouvertes.fr/hal-01109405
A microbenchmark study of openmp overheads under nested parallelism, OpenMP in a New Era of Parallelism, pp.1-12, 2008. ,
Scalable work stealing, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp.1-53, 2009. ,
DOI : 10.1145/1654059.1654113
A study on residual stresses in rolling, International Journal of Machine Tools and Manufacture, vol.37, issue.6, pp.837-853, 1997. ,
DOI : 10.1016/S0890-6955(96)00052-1
Caches sémantiques coopératifs pour la gestion de données sur grilles, 2007. ,
Effects of cache coherency in multiprocessors . Computers, IEEE Transactions, issue.11, pp.311083-1099, 1982. ,
Evaluating the performance of four snooping cache coherency protocols, ACM SIGARCH Computer Architecture News, vol.17, issue.3, pp.2-15, 1989. ,
DOI : 10.1145/74926.74927
Exact and efficient verification of parameterized cache coherence protocols, Correct Hardware Design and Verification Methods, pp.247-262, 2003. ,
DOI : 10.1007/978-3-540-39724-3_22
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.4629
Rapid parameterized model checking of snoopy cache coherence protocols Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science, vol.2619, pp.144-159, 2003. ,
The x-kaapi? application programming interface. part i : Data flow programming, 2011. ,
Reduction methods for fast transient structural dynamics applicated to the analysis of complex structures under impact ,
URL : https://hal.archives-ouvertes.fr/tel-01018792
Advanced parallel strategy for strongly coupled fast transient fluid-structure dynamics with dual management of kinematic constraints Advances in Engineering Software, pp.70-89, 2014. ,
Numerical methods and parallel algorithms for fast transient strongly coupled fluid-structure dynamics. Habilitation à diriger des recherches, 2014. ,
DOI : 10.1016/j.advengsoft.2013.08.002
URL : https://hal.archives-ouvertes.fr/tel-01011205
Structure-Based Drug Discovery Accelerated by Many-Core Devices, Current Drug Targets, vol.17, issue.14, 2016. ,
DOI : 10.2174/1389450117666160112112854
URL : http://doi.org/10.2174/1389450117666160112112854
Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972. ,
DOI : 10.1109/TC.1972.5009071
Cache-oblivious algorithms, Foundations of Computer Science 40th Annual Symposium on, pp.285-297, 1999. ,
The implementation of the Cilk-5 multithreaded language, ACM SIGPLAN Notices, vol.33, issue.5, pp.212-223, 1998. ,
DOI : 10.1145/277652.277725
Les éléments finis : de la théorie à la pratique, 2011. ,
X-kaapi: A Multi Paradigm Runtime for Multicore Architectures, 2013 42nd International Conference on Parallel Processing, pp.728-735, 2013. ,
DOI : 10.1109/ICPP.2013.86
URL : https://hal.archives-ouvertes.fr/hal-00727827
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
Mecanique Tome I. Ecole polytechnique, 1986. ,
Designing OP2 for GPU architectures, Journal of Parallel and Distributed Computing, vol.73, issue.11, pp.1451-1460, 2013. ,
DOI : 10.1016/j.jpdc.2012.07.008
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.5159
Development of a convex polyhedral discrete element simulation framework for NVIDIA Kepler based GPUs, Fourth International Conference on Finite Element Methods in Engineering and Sciences, pp.386-400, 2013. ,
DOI : 10.1016/j.cam.2013.12.032
Bounds on Multiprocessing Timing Anomalies, SIAM Journal on Applied Mathematics, vol.17, issue.2, pp.416-429, 1969. ,
DOI : 10.1137/0117039
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.8131
Slaw : A scalable locality-aware adaptive work-stealing scheduler for multi-core systems, Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pp.341-342, 2010. ,
Non-blocking steal-half work queues, Proceedings of the twenty-first annual symposium on Principles of distributed computing , PODC '02, pp.280-289, 2002. ,
DOI : 10.1145/571825.571876
The art of multiprocessor programming, Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing , PODC '06, 2008. ,
DOI : 10.1145/1146381.1146382
A finite element formulation for problems of large strain and large displacement, International Journal of Solids and Structures, vol.6, issue.8, pp.1069-1086, 1970. ,
DOI : 10.1016/0020-7683(70)90048-X
Titanium language reference manual, version 2.19, 2005. ,
Evaluating associativity in cpu caches. Computers, IEEE Transactions on, vol.38, issue.12, pp.1612-1630, 1989. ,
DOI : 10.1109/12.40842
JIAJIA: A software DSM system based on a new cache coherence protocol, High-Performance Computing and Networking, pp.461-472, 1999. ,
DOI : 10.1007/BFb0100607
Mécanique des fluides Tome 1, 1998. ,
Tuning the victim selection policy of intel {TBB}, Journal of Systems Architecture, 2015. ,
Hierarchical work stealing on manycore clusters, Fifth Conference on Partitioned Global Address Space Programming Models, 2011. ,
Optimizing spatial locality in loop nests using linear algebra, Proc. 7th Workshop Compilers for Parallel Computers, p.430, 1998. ,
DOI : 10.1145/277830.277849
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.27.1030
Auto-tuning support for manycore applications-perspectives for operating systems and compilers ,
DOI : 10.1145/1531793.1531808
The rise and fall of High Performance Fortran, Proceedings of the third ACM SIGPLAN conference on History of programming languages , HOPL III, pp.7-8, 2007. ,
DOI : 10.1145/1238844.1238851
A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation, 2002. ,
DOI : 10.1007/3-540-36383-1_4
Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems, 2010 39th International Conference on Parallel Processing Workshops, pp.413-420, 2010. ,
DOI : 10.1109/ICPPW.2010.62
MMéthode Lagrangienne actualisée pour des problémes hyperélastiques en trés grandes déformations, 2014. ,
Algorithmique Parallèle ? Cours Et Exercices Corrigés. Dunod, 2003. ,
The directory-based cache coherence protocol for the dash multiprocessor, Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA '90, pp.148-159, 1990. ,
A singular loop transformation framework based on non-singular matrices, 1993. ,
ZPL: An array sublanguage, Languages and Compilers for Parallel Computing, pp.96-114, 1994. ,
DOI : 10.1007/3-540-57659-2_6
Comparative Analysis of the Cuthill???McKee and the Reverse Cuthill???McKee Ordering Algorithms for Sparse Matrices, SIAM Journal on Numerical Analysis, vol.13, issue.2, pp.198-213, 1976. ,
DOI : 10.1137/0713020
High performance fortran. Parallel & Distributed Technology : Systems & Applications, IEEE, vol.1, issue.1, pp.25-42, 1993. ,
Méthode générale de couplage de schéma d'intégration multiéchelles en temps en dynamique des structures, 2010. ,
Structured Parallel Programming : Patterns for Efficient Computation, 2012. ,
Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering, IEEE Transactions on, vol.13, issue.1, pp.124-141, 2001. ,
On certain crinkly curves, Transactions of the American Mathematical Society, vol.1, issue.1, pp.72-90, 1900. ,
Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, pp.82-85, 1998. ,
DOI : 10.1109/JPROC.1998.658762
Impact of NUMA Effects on High- Speed Networking with Multi-Opteron Machines, PDCS, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00175747
Introduction a la mecanique des milieux continus, 1980. ,
Extension of the AMBER molecular dynamics software to Intel???s Many Integrated Core (MIC) architecture, Computer Physics Communications, vol.201, 2016. ,
DOI : 10.1016/j.cpc.2015.12.025
A method of computation for structural dynamics, Journal of the Engineering Mechanics Division, vol.85, issue.3, pp.67-94, 1959. ,
Spacer-filling curves and their application in image processing. Theses, 2013. ,
Openmp and automatic parallelization in gcc, the Proceedings of the GCC Developers, 2006. ,
Co-array fortran for parallel programming, SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998. ,
A Model-Based Approach for the Development of High- Performance Scientific Computing Software. Theses, 2012. ,
URL : https://hal.archives-ouvertes.fr/tel-00865535
Computer Architecture : A Quantitative Approach, 1990. ,
Development of an Arbitrary Lagrangian Eulerian (ALE) formulation for the 3D simulation of flat rolling, 2009. ,
URL : https://hal.archives-ouvertes.fr/tel-00431051
Improving parallel system performance with a numa-aware load balancer, 2011. ,
Dynamic Load-Balancing on Hierarchical Platforms. Theses, 2011. ,
URL : https://hal.archives-ouvertes.fr/tel-00661447
History based work-stealing for dynamic numerical simulations, 2011. ,
Cilk : Efficient Multithreaded Computing, 1998. ,
Intel Threading Building Blocks, 2007. ,
Memory Affinity for Hierarchical Shared Memory Multiprocessors, 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp.59-66, 2009. ,
DOI : 10.1109/SBAC-PAD.2009.16
URL : https://hal.archives-ouvertes.fr/hal-00788914
Optimization via Reflection on Work ,
DOI : 10.1109/ipdps.2008.4536188
Space-filling curves, 2012. ,
DOI : 10.1007/978-1-4612-0871-6
On the Peano curve of Lebesgue, Bulletin of the American Mathematical Society, vol.44, issue.8, p.519, 1938. ,
DOI : 10.1090/S0002-9904-1938-06792-4
A study of instruction cache organizations and replacement policies, SIGARCH Comput. Archit. News, vol.11, issue.3, pp.132-137, 1983. ,
Organisation et architecture de l'ordinateur. Imp. la source d'or, 2003. ,
An Introduction to theroritical fluid machanics, 2000. ,
An Analysis of the Finite-Element Method, Journal of Applied Mechanics, vol.41, issue.1, 1973. ,
DOI : 10.1115/1.3423272
Supporting cache coherence in heterogeneous multiprocessor systems, Proceedings Design, Automation and Test in Europe Conference and Exhibition, 2003. ,
DOI : 10.1109/DATE.2004.1269047
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.1306
PVM: A framework for parallel distributed computing, Concurrency: Practice and Experience, vol.4, issue.4, pp.315-339, 1990. ,
DOI : 10.1002/cpe.4330020404
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2880
Performance Evaluation of OpenMP Applications with Nested Parallelism, Languages, Compilers, and Run-Time Systems for Scalable Computers, pp.100-112, 1915. ,
DOI : 10.1007/3-540-40889-4_8
Algorithmes parallèle efficace en Cache : Application à la visualisation scientifique, 2010. ,
X-kaapi : a multi paradigm runtime for multicore architectures ,
A compiler for exploiting nested parallelism in OpenMP programs, Parallel Computing, vol.31, issue.10-12, pp.10-12960, 2005. ,
DOI : 10.1016/j.parco.2005.03.007
Steal Locally, Share Globally, International Journal of Parallel Programming, vol.18, issue.4, pp.894-917, 2015. ,
DOI : 10.1007/s10766-015-0350-0
Self-adaptive parallel algorithms and applications. Theses, Institut National Polytechnique de Grenoble -INPG, 2008. ,
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
URL : http://arxiv.org/abs/1004.4431
Introduction to the first draft report on the edvac, 1945. ,
Fast construction of sah bvhs on the intel many integrated core (mic) architecture. Visualization and Computer Graphics, IEEE Transactions on, vol.18, issue.1, pp.47-57, 2012. ,
Mpi : A standard message passing interface, pp.56-68, 1996. ,
Slave memories and dynamic storage allocation. Electronic Computers, IEEE Transactions, issue.2, pp.14270-271, 1965. ,
DOI : 10.1109/pgec.1965.264263
A data locality optimizing algorithm, pp.30-44, 1991. ,
High Performance Compilers for Parallel Computing, 1995. ,
Communication complexity for parallel divideand-conquer, Foundations of Computer Science Proceedings., 32nd Annual Symposium on, pp.151-162, 1991. ,
DOI : 10.1109/sfcs.1991.185364
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.6209