Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation, IPDPS, pp.835-846, 2015. ,
DOI : 10.1109/ipdps.2015.94
Cilk: An Efficient Multithreaded Runtime System, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.55-69, 1996. ,
DOI : 10.1006/jpdc.1996.0107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175
Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999. ,
libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th international conference on OpenMP in a Heterogeneous World, pp.102-115, 2012. ,
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253
Parametric analysis of polyhedral iteration spaces Journal of VLSI signal processing systems for signal, image and video technology, pp.179-194, 1998. ,
Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969. ,
DOI : 10.1145/800195.805928
Several Strategies for Reducing the Bandwidth of Matrices, Sparse Matrices and their Applications The IBM Research Symposia Series, pp.157-166, 1972. ,
DOI : 10.1007/978-1-4615-8675-3_14
KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007. ,
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474
Development of a convex polyhedral discrete element simulation framework for NVIDIA Kepler based GPUs, Fourth International Conference on Finite Element Methods in Engineering and Sciences, pp.386-400, 2013. ,
DOI : 10.1016/j.cam.2013.12.032
Optimizing spatial locality in loop nests using linear algebra, Proc. 7th Workshop Compilers for Parallel Computers, p.430, 1998. ,
Comparative Analysis of the Cuthill???McKee and the Reverse Cuthill???McKee Ordering Algorithms for Sparse Matrices, SIAM Journal on Numerical Analysis, vol.13, issue.2, pp.198-213, 1976. ,
DOI : 10.1137/0713020
Intel Threading Building Blocks, 2007. ,
Space-filling curves, 2012. ,
DOI : 10.1007/978-1-4612-0871-6
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010. ,
DOI : 10.1109/ICPPW.2010.38
Fast construction of sah bvhs on the intel many integrated core (mic) architecture. Visualization and Computer Graphics, IEEE Transactions on, vol.18, issue.1, pp.47-57, 2012. ,
High Performance Compilers for Parallel Computing, 1995. ,
An experimental comparison of cache-oblivious and cache-conscious programs, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.93-104, 2007. ,
DOI : 10.1145/1248377.1248394