Simple, portable, scalable smp programming, 2006. ,
Adaptive approaches for efficient parallel algorithms on cluster-based systems, International Journal of Grid and Utility Computing, vol.1, issue.2, pp.99-108, 2009. ,
DOI : 10.1504/IJGUC.2009.022026
URL : https://hal.archives-ouvertes.fr/hal-00953258
Invited Paper: A Compile-time Cost Model for OpenMP, 2007 IEEE International Parallel and Distributed Processing Symposium, p.208, 2007. ,
DOI : 10.1109/IPDPS.2007.370398
The communication challenge for MPP: Intel Paragon and Meiko CS-2, Parallel Computing, vol.20, issue.3, pp.389-398, 1994. ,
DOI : 10.1016/S0167-8191(06)80021-9
Network performance-aware collective communication for clustered wide-area systems, Parallel Computing, vol.27, issue.11, pp.1431-1456, 2001. ,
DOI : 10.1016/S0167-8191(01)00098-9
Combining loop transformations considering caches and scheduling, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29, pp.274-286, 1996. ,
DOI : 10.1109/MICRO.1996.566468
A comparison of empirical and model-driven optimization, PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pp.63-76, 2003. ,
Using OpenMP -Portable Shared Memory Parallel Programming, 2008. ,
Automated empirical optimization of software and the ATLAS project, ) 3?35 Also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448, 2000. ,
Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology, Proceedings of International Conference on Supercomputing, 1997. ,
Performance Modeling for Dynamic Algorithm Selection, International Conference on Computational Science, pp.749-758, 2003. ,
DOI : 10.1007/3-540-44864-0_77
The Design and Implementation of FFTW3, Proceedings of the IEEE, vol.93, issue.2, pp.216-231, 2005. ,
DOI : 10.1109/JPROC.2004.840301
Adaptive computing on the grid using AppLeS, IEEE Transactions on Parallel and Distributed Systems, vol.14, issue.4, 2003. ,
DOI : 10.1109/TPDS.2003.1195409
Adaptive selection of communication methods to optimize collective mpi operations, Proceedings of the 12th Workshop on Compilers for Parallel Computers (CPC'06), 2006. ,
Adaptive communication algorithms for distributed heterogeneous systems, Proceedings of the IEEE International Symposium on High Performance Distributed Computing, 1998. ,
Adaptive matrix multiplication in heterogeneous environments, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings., p.129, 2002. ,
DOI : 10.1109/ICPADS.2002.1183389
An adaptive algorithm selection framework for reduction parallelization, IEEE Transactions on Parallel and Distributed Systems, 2006. ,
A framework for adaptive algorithm selection in STAPL, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming , PPoPP '05, pp.277-288, 2005. ,
DOI : 10.1145/1065944.1065981
Automatic optimisation of parallel linear algebra routines in systems with variable load, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings., pp.409-416, 2003. ,
DOI : 10.1109/EMPDP.2003.1183618
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture, 11th International Symposium on High-Performance Computer Architecture, pp.340-351, 2005. ,
DOI : 10.1109/HPCA.2005.27
Analysis of benchmark characteristics and benchmark performance prediction, ACM Transactions on Computer Systems, vol.14, issue.4, pp.344-384, 1996. ,
DOI : 10.1145/235543.235545
Modeling application performance by convolving machine signatures with application profiles, Proceedings of the Fourth Annual IEEE International Workshop on Workload Characterization. WWC-4 (Cat. No.01EX538), pp.194-156, 2001. ,
DOI : 10.1109/WWC.2001.990754
Performance prediction for random write reductions: a case study in modeling shared memory programs, SIGMETRICS'02: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp.117-128, 2002. ,
Parallel program performance prediction using deterministic task graph analysis, ACM Transactions on Computer Systems, vol.22, issue.1, pp.94-136, 2004. ,
DOI : 10.1145/966785.966788