M. A. Bender, J. W. Berry, S. D. Hammond, K. Scott-hemmert, S. Mc-cauley et al., Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation, IPDPS, pp.835-846, 2015.
DOI : 10.1109/ipdps.2015.94

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: An Efficient Multithreaded Runtime System, Journal of Parallel and Distributed Computing, vol.37, issue.1, pp.55-69, 1996.
DOI : 10.1006/jpdc.1996.0107
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175

D. Robert, C. E. Blumofe, and . Leiserson, Scheduling multithreaded computations by work stealing, J. ACM, vol.46, issue.5, pp.720-748, 1999.

F. Broquedis, T. Gautier, and V. Danjean, libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms, Proceedings of the 8th international conference on OpenMP in a Heterogeneous World, pp.102-115, 2012.
DOI : 10.1007/978-3-642-30961-8_8
URL : https://hal.archives-ouvertes.fr/hal-00796253

P. Clauss and V. Loechner, Parametric analysis of polyhedral iteration spaces Journal of VLSI signal processing systems for signal, image and video technology, pp.179-194, 1998.

E. Cuthill and J. Mckee, Reducing the bandwidth of sparse symmetric matrices, Proceedings of the 1969 24th national conference on -, pp.157-172, 1969.
DOI : 10.1145/800195.805928

E. Cuthill, Several Strategies for Reducing the Bandwidth of Matrices, Sparse Matrices and their Applications The IBM Research Symposia Series, pp.157-166, 1972.
DOI : 10.1007/978-1-4615-8675-3_14

T. Gautier, X. Besseron, and L. Pigeon, KAAPI, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.15-23, 2007.
DOI : 10.1145/1278177.1278182
URL : https://hal.archives-ouvertes.fr/hal-00647474

N. Govender, D. N. Wilke, S. Kok, and R. Els, Development of a convex polyhedral discrete element simulation framework for NVIDIA Kepler based GPUs, Fourth International Conference on Finite Element Methods in Engineering and Sciences, pp.386-400, 2013.
DOI : 10.1016/j.cam.2013.12.032

M. Kandemir, . Choudhary, P. Ramanujam, and . Banerjee, Optimizing spatial locality in loop nests using linear algebra, Proc. 7th Workshop Compilers for Parallel Computers, p.430, 1998.

W. Liu and A. H. Sherman, Comparative Analysis of the Cuthill???McKee and the Reverse Cuthill???McKee Ordering Algorithms for Sparse Matrices, SIAM Journal on Numerical Analysis, vol.13, issue.2, pp.198-213, 1976.
DOI : 10.1137/0713020

J. Reinders, Intel Threading Building Blocks, 2007.

H. Sagan, Space-filling curves, 2012.
DOI : 10.1007/978-1-4612-0871-6

J. Treibig, G. Hager, and G. Wellein, LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments, 2010 39th International Conference on Parallel Processing Workshops, pp.207-216, 2010.
DOI : 10.1109/ICPPW.2010.38

I. Wald, Fast construction of sah bvhs on the intel many integrated core (mic) architecture. Visualization and Computer Graphics, IEEE Transactions on, vol.18, issue.1, pp.47-57, 2012.

M. Wolfe, High Performance Compilers for Parallel Computing, 1995.

K. Yotov, T. Roeder, K. Pingali, J. Gunnels, and F. Gustavson, An experimental comparison of cache-oblivious and cache-conscious programs, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.93-104, 2007.
DOI : 10.1145/1248377.1248394