E. Agullo, P. R. Amestoy, A. Buttari, A. Guermouche, J. -y.-l'excellent et al., Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations, SIAM Journal on Scientific Computing, vol.38, pp.256-279, 2016.
DOI : 10.1137/130938505

URL : https://hal.archives-ouvertes.fr/hal-00726644

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Trans. Math. Softw, vol.43, 2016.
DOI : 10.1145/2898348

URL : https://hal.archives-ouvertes.fr/hal-01333645

P. R. Amestoy, R. Brossier, A. Buttari, J. -y.-l'excellent, T. Mary et al., Fast 3D frequency-domain full waveform inversion with a parallel Block Low-Rank multifrontal direct solver: application to OBC data from the North Sea, Geophysics 81, vol.6, pp.363-383, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01349119

P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J. -y.-l'excellent et al., Improving Multifrontal Methods by Means of Block Low-Rank Representations, In: SIAM Journal on Scientific Computing, vol.37, 2015.
DOI : 10.1137/120903476

URL : https://hal.archives-ouvertes.fr/hal-00776859

P. Amestoy, A. Buttari, G. Joslin, J. -y.-l'excellent, M. Sid-lakhdar et al., Shared-Memory Parallelism and Low-Rank Approximation Techniques Applied to Direct Solvers in FEM Simulation, IEEE Transactions on Magnetics, vol.50, issue.2, pp.517-520, 2014.
DOI : 10.1109/tmag.2013.2284024

URL : https://hal.archives-ouvertes.fr/hal-01123557

P. Amestoy, A. Buttari, J. -y.-l'excellent, and T. Mary, On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM Journal on Scientific Computing, vol.39, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01322230

M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou et al., Accelerating scientific computations with mixed precision algorithms, Computer Physics Communications, vol.180, pp.2526-2533, 2009.
DOI : 10.1016/j.cpc.2008.11.005

URL : http://arxiv.org/pdf/0808.2794

L. Bouchet, P. Amestoy, A. Buttari, F. Rouet, M. Chauvin et al., Simultaneous analysis of large INTEGRAL/SPI datasets: optimizing the computation of the solution and its variance using sparse matrix algorithms, Astronomy & Astrophysics A52, vol.1, pp.59-69, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01125193

A. Buttari, Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices, In: SIAM Journal on Scientific Computing, vol.35, pp.323-345, 2013.
DOI : 10.1137/110846427

URL : https://hal.archives-ouvertes.fr/hal-01122471

A. Buttari, P. D'ambra, D. D. Serafino, and S. Filippone, 2LEV-D2P4: a package of high-performance preconditioners for scientific and engineering applications, In: Appl. Algebra Eng., Commun. Comput, vol.18, pp.938-1279, 2007.
DOI : 10.1007/s00200-007-0035-z

A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov, Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy, ACM Trans. Math. Softw, vol.34, pp.1-22, 2008.
DOI : 10.1145/1377596.1377597

URL : http://www.netlib.org/netlib/utk/people/JackDongarra/PAPERS/iterative-refine-toms-2007.pdf

A. Buttari, J. Dongarra, J. Langou, J. Langou, P. Luszczek et al., Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems, Int. J. High Perform. Comput. Appl, vol.21, pp.457-466, 2007.

A. Buttari, V. Eijkhout, J. Langou, and S. Filippone, Performance Optimization and Modeling of Blocked Sparse Kernels, Int. J. High Perform. Comput. Appl, vol.21, pp.467-484, 2007.
DOI : 10.1177/1094342007083801

URL : http://hpc.sagepub.com/cgi/reprint/21/4/467.pdf

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput, vol.35, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurr. Comput. : Pract. Exper, vol.20, pp.1532-0626, 2008.
DOI : 10.1007/978-3-540-68111-3_67

URL : http://www.netlib.org/lapack/lawnspdf/lawn190.pdf

S. Filippone and A. Buttari, Object-Oriented Techniques for Sparse Matrix Computations in Fortran, ACM Transactions on Mathematical Software, vol.38, p.20, 2003.
DOI : 10.1145/2331130.2331131

J. Kurzak, A. Buttari, and J. Dongarra, Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, IEEE Trans. Parallel Distrib. Syst, vol.19, pp.1045-9219, 2008.

J. Kurzak, A. Buttari, P. Luszczek, and J. Dongarra, The PlayStation 3 for High-Performance Scientific Computing, Computing in Science and Eng, vol.10, pp.1521-9615, 2008.
DOI : 10.1109/mcse.2008.85

URL : http://www.cs.utk.edu/~library/TechReports/2008/ut-cs-08-608.pdf

D. Shantsev, P. Jaysaval, S. De-la-kethulle-de-ryhove, P. Amestoy, A. Buttari et al.,

L. 'excellent and T. Mary, Large-scale 3D EM modeling with a Block Low-Rank multifrontal direct solver, Geophysical Journal International, 2017.

E. Agullo, P. R. Amestoy, A. Buttari, A. Guermouche, G. Joslin et al., Recent advances in sparse direct solvers, Conference on Structural Mechanicsin Reactor Technology, 2013.
URL : https://hal.archives-ouvertes.fr/hal-01060301

E. Agullo, G. Bosilca, A. Buttari, A. Guermouche, F. Lopez et al., Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver, Euro-Par 2016: Parallel Processing Workshops: Euro-Par 2016 International Workshops, pp.175-186, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01337748

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, 2013.
DOI : 10.1007/978-3-642-40047-6_53

URL : https://hal.archives-ouvertes.fr/hal-01220611

, Parallel Processing, pp.521-532, 2013.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, In: HiPC. IEEE Computer Society, pp.978-979, 2015.
DOI : 10.1109/hipc.2015.27

URL : https://hal.archives-ouvertes.fr/hal-01270145

P. R. Amestoy, R. Brossier, A. Buttari, J. -y.-l'excellent, T. Mary et al., 3D frequency-domain seismic modeling with a Parallel BLR multifrontal direct solver, SEG Technical Program Expanded Abstracts 2015. 2015. Chap. 692, pp.3606-3611
DOI : 10.1190/segam2015-5811693.1

URL : https://hal.archives-ouvertes.fr/hal-01237869

P. R. Amestoy, Efficient 3D frequency-domain full-waveform inversion of ocean-bottom cable data with sparse block low-rank direct solver: a real data case study from the North Sea, SEG Technical Program Expanded Abstracts, vol.251, pp.1303-1308, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01239896

P. Amestoy, A. Buttari, G. Joslin, J. -y.-l'excellent, M. Sid-lakhdar et al., Shared memory parallelism and low-rank approximation techniques applied to direct solvers in FEM simulation, IEEE International Conference on the Computation of Electromagnetic Fields (COMPUMAG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-01123557

G. Antoniu, Towards exascale with the ANR-JST Japanese-French Project FP3C, Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers, pp.1-10, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00922754

G. Bella, A. Buttari, A. De-maio, F. Del-citto, S. Filippone et al., FAST-EVP: an Engine Simulation Tool, High Perfromance Computing and Communications. First International Conference, HPCC 2005, Proceedings, vol.3726

L. Bouchet, P. Amestoy, A. Buttari, F. Rouet, and M. Chauvin, INTEGRAL/SPI data segmentation to retrieve sources intensity variations (regular paper), An INTEGRAL view of the high-energy sky (the first 10 years), 2013.

A. Buttari, Fine granularity sparse QR factorization for multicore based systems, Proceedings of the 10th international conference on Applied Parallel and Scientific Computing, vol.2, pp.226-236, 2012.

A. Buttari, P. D'ambra, D. S. Di-serafino, and S. Filippone, Extending PSBLAS to Build Parallel Schwarz Preconditioners, Applied Parallel Computing. State of the Art in Scientific Computing: 7th International Conference, vol.3732, pp.593-602, 2004.

A. Buttari, J. Dongarra, P. Husbands, J. Kurzak, and K. Yelick, Multithreading for Synchronization Tolerance in Matrix Factorization, Proceedings of the SciDAC 2007 Conference, 2007.

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek et al., The impact of multicore on math software, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing. PARA'06, pp.1-10, 2007.

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, PPAM'07: Proceedings of the 7th international conference on Parallel processing and applied mathematics, pp.639-648, 2008.

J. , Prospectus for the Next LAPACK and ScaLAPACK Libraries, PARA'06: State-of-the-Art in Scientific and Parallel Computing. High Performance Computing Center North (HPC2N) and the Department of Computing Science, 2006.

G. Hautreux, Pre-exascale Architectures: OpenPOWER Performance and Usability Assessment for French Scientific Community, High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P3MA, VHPC, Visualization at, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02129651

J. M. Ed, R. Kunkel, M. Yokota, J. Taufer, . Shalf et al., , pp.309-324, 2017.

J. Langou, J. Langou, P. Luszczek, J. Kurzak, A. Buttari et al., Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p.113, 2006.

L. Stanisic, E. Agullo, A. Buttari, A. Guermouche, A. Legrand et al., Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers, Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on, pp.481-490, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01180272

C. Weisbecker, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari et al., 3D frequency-domain seismic modeling with a block low-rank algebraic multifrontal direct solver, SEG Technical Program Expanded Abstracts, vol.662, pp.3411-3416, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00924638

P. Amestoy, A. Buttari, I. Duff, A. Guermouche, J. -y.-l'excellent et al., MUMPS". In: Encyclopedia of Parallel Computing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00787042

P. Amestoy, A. Buttari, I. Duff, A. Guermouche, J. -y.-l'excellent et al., The Multifrontal Method, Encyclopedia of Parallel Computing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00787015

A. Buttari, J. Dongarra, J. Kurzak, and J. Langou, Parallel Dense Linear Algebra Software in the Multicore Era, Cyberinfrastructure Technologies and Applications, 2007.

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou et al., Exploiting Mixed Precision Floating Point Hardware in Scientific Computations, High Performance Computing and Grids in Action, 2007.

J. , Prospectus for a Linear Algebra Software Library for Dense Matrix Problems, Handbook of Parallel Computing: Models, Algorithms and Applications, vol.17, p.9781584886235, 2007.

P. R. Amestoy, A. Buttari, J. -y.-l-'excellent, and T. Mary, Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, Research Report. submitted to ACM TOMS. INPT-IRIT
URL : https://hal.archives-ouvertes.fr/hal-01505070

U. , , 2017.

P. Amestoy, A. Buttari, J. -y.-l'excellent, and T. Mary, Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format, Tech. rep. Submitted to the SIAM Journal on Scientific Computing. IRIT, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01774642

D. Achiya and E. Lars, Approximating minimum norm solutions of rank-deficient least squares problems, Numerical Linear Algebra with Applications, vol.5, pp.79-99, 1998.

E. Agullo, On the out-of-core factorization of large sparse matrices, 2008.
URL : https://hal.archives-ouvertes.fr/tel-00563463

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, p.27, 2016.
DOI : 10.1109/tpds.2017.2766064

URL : https://hal.archives-ouvertes.fr/hal-01618526

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Taskbased FMM for heterogeneous architectures, p.29, 2014.
DOI : 10.1002/cpe.3723

URL : https://hal.archives-ouvertes.fr/hal-00974674

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, p.12037, 2009.

E. Agullo, J. Dongarra, R. Nath, and S. Tomov, Fully Empirical Autotuned QR Factorization For Multicore Architectures, 2011.
DOI : 10.1007/978-3-642-23397-5_19

URL : https://hal.archives-ouvertes.fr/hal-00726654

E. Agullo, L. Giraud, S. Nakov, and J. Roman, Hierarchical hybrid sparse linear solver for multicore platforms, p.25, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01379227

E. Agullo, A. Guermouche, and J. Excellent, Reducing the I/O Volume in Sparse Out-of-core Multifrontal Methods, SIAM Journal on Scientific Computing, vol.31, pp.4774-4794, 2010.

K. Akbudak, H. Ltaief, A. Mikhalev, A. Charara, and D. E. Keyes, Exploiting Data Sparsity for Large-Scale Matrix Computations, 2018.
DOI : 10.1007/978-3-319-96983-1_51

URL : https://repository.kaust.edu.sa/bitstream/10754/627403/1/hicma_tech.pdf

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002.

P. R. Amestoy, T. A. Davis, and I. S. Duff, An Approximate Minimum Degree Ordering Algorithm, In: SIAM J. Matrix Anal. Appl, vol.17, pp.886-905, 1996.
DOI : 10.1137/s0895479894278952

P. R. Amestoy and I. S. Duff, Memory Management Issues in Sparse Multifrontal Methods On Multiprocessors, The International Journal of Supercomputing Applications, vol.7, issue.1, pp.64-82, 1993.

P. R. Amestoy, I. S. Duff, J. Koster, and J. Excellent, A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling, SIAM Journal on Matrix Analysis and Applications, vol.23, pp.15-41, 2001.
URL : https://hal.archives-ouvertes.fr/hal-00808293

P. R. Amestoy, I. S. Duff, J. -y.-l'excellent, Y. Robert, F. Rouet et al., On computing inverse entries of a sparse matrix in an out-of-core environment, SIAM Journal on Scientific Computing, vol.34, pp.1975-1999, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00763556

P. R. Amestoy, I. S. Duff, and C. Puglisi, Multifrontal QR factorization in a multiprocessor environment, Int. Journal of Num. Linear Alg. and Appl, vol.3, issue.4, pp.275-300, 1996.

P. R. Amestoy, A. Guermouche, J. -y.-l'excellent, and S. Pralet, Hybrid scheduling for the parallel solution of linear systems, Parallel Computing, vol.32, issue.2, pp.136-156, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00070599

P. R. Amestoy and C. Puglisi, An unsymmetrized multifrontal LU factorization, SIAM Journal on Matrix Analysis and Applications, vol.24, pp.553-569, 2002.
DOI : 10.2172/776628

URL : https://digital.library.unt.edu/ark:/67531/metadc715385/m2/1/high_res_d/776628.pdf

P. Amestoy, J. -y.-l'excellent, and G. Moreau, On Exploiting Sparsity of Multiple Right-Hand Sides in Sparse Direct Solvers, pp.1-28, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01955659

A. Aminfar, S. Ambikasaran, and E. Darve, A fast block low-rank dense solver with applications to finite-element matrices, Journal of Computational Physics, vol.304, pp.170-188, 2016.

E. Anderson, LAPACK Users' Guide. Third. Philadelphia, PA: Society for Industrial and Applied Mathematics, pp.0-89871, 1999.

J. Anton, C. Ashcraft, and C. Weisbecker, A Block Low-Rank multithreaded factorization for dense BEM operators, SIAM Conference on Parallel Processing (SIAM PP16), 2016.

N. S. Arora, R. D. Blumofe, and C. G. Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors, Theory Comput. Syst, vol.34, pp.115-144, 2001.

K. Asanovic, The Landscape of Parallel Computing Research: A View from Berkeley, TECHNICAL REPORT, 2006.

C. Ashcraft, The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms, Graph Theory and Sparse Matrix Computation, pp.978-979, 1993.

C. Ashcraft and R. G. Grimes, SPOOLES: An object oriented sparse matrix library, Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

H. Avron, E. Ng, and S. Toledo, Using Perturbed QR Factorizations to Solve Linear Least-Squares Problems, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.674-693, 2009.

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.21, pp.2438-2456, 2009.

G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight et al., Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numerica 23, pp.1-155, 2014.

O. Beaumont and A. Guermouche, Task scheduling for parallel multifrontal methods, Euro-Par 2007 Parallel Processing, pp.758-766, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00358626

M. Bebendorf, Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems (Lecture Notes in Computational Science and Engineering), p.3540771468, 2008.

M. Bebendorf, Approximation of boundary element matrices, Numerische Mathematik, vol.86, pp.565-589, 2000.

M. Bebendorf, Efficient inversion of Galerkin matrices of general second-order elliptic differential operators with nonsmooth coefficients, Mathematics of Computation, vol.74, pp.1179-1199, 2005.

M. Bebendorf, Why finite element discretizations can be factored by triangular hierarchical matrices, SIAM Journal on Numerical Analysis, vol.45, p.1472, 2007.

M. Bebendorf and W. Hackbusch, Existence of H-matrix approximants to the inverse FE-matrix of elliptic operators with L ?-coefficients, Numerische Mathematik, vol.95, pp.1-28, 2003.

Å. Björck, Numerical methods for Least Squares Problems. Philadelphia: SIAM, 1996.

S. Börm, L. Grasedyck, and W. Hackbusch, Introduction to hierarchical matrices with applications, Engineering analysis with boundary elements, vol.27, pp.405-422, 2003.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science and Engineering, vol.15, pp.36-45, 2013.

H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert, Tiled QR Factorization Algorithms, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11, vol.7, 2011.
DOI : 10.1145/2063384.2063393

URL : https://hal.archives-ouvertes.fr/hal-00945074

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Application, Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference, pp.180-186, 2010.
DOI : 10.1109/pdp.2010.67

URL : https://hal.inria.fr/file/index/docid/429889/filename/main.pdf

W. J. Brouwer and P. Taunay, Efficient Batch LU and QR Decomposition on GPU, Numerical Computations with GPUs, pp.978-981, 2014.
DOI : 10.1007/978-3-319-06548-9_4

R. A. Brualdi and H. J. Ryser, Combinatorial Matrix Theory. Encyclopedia of Mathematics and its Applications, 1991.
DOI : 10.1201/b16113-44

R. A. Brualdi and B. L. Shader, Strong Hall Matrices, In: SIAM J. Matrix Anal. Appl, vol.15, issue.2, pp.359-365, 1994.
DOI : 10.1137/s0895479892225142

J. R. Bunch and L. Kaufman, Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems, Mathematics of Computation, vol.31, pp.162-179, 1977.
DOI : 10.2307/2005787

URL : https://www.ams.org/mcom/1977-31-137/S0025-5718-1977-0428694-0/S0025-5718-1977-0428694-0.pdf

P. Businger and G. H. Golub, Linear Least Squares Solutions by Householder Transformations, Numer. Math, vol.7, issue.3, pp.269-276, 1965.
DOI : 10.1007/bf01436084

S. Chandrasekaran, P. Dewilde, M. Gu, and N. Somasunderam, On the Numerical Rank of the Off-Diagonal Blocks of Schur Complements of Discretized Elliptic PDEs, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.2261-2290, 2010.

S. Chandrasekaran, M. Gu, and T. Pals, A fast ULV decomposition solver for hierarchically semiseparable representations, SIAM Journal on Matrix Analysis and Applications, vol.28, issue.3, pp.603-622, 2006.
DOI : 10.1137/s0895479803436652

Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam, Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate, ACM Trans. Math. Softw, vol.35, issue.3, 2008.

H. Cheng, Z. Gimbutas, P. G. Martinsson, and V. Rokhlin, On the Compression of Low Rank Matrices, SIAM Journal on Scientific Computing, vol.26, pp.1389-1404, 2005.

T. F. Coleman, A. Edenbrandt, and J. R. Gilbert, Predicting Fill for Sparse Orthogonal Factorization, J. ACM, vol.33, issue.3, pp.517-532, 1986.
DOI : 10.1145/5925.5932

URL : https://ecommons.cornell.edu/bitstream/1813/6418/1/83-578.pdf

S. Constable, Ten years of marine CSEM for hydrocarbon exploration, Geophysics, vol.75, pp.75-67, 2010.
DOI : 10.1190/1.3483451

P. Coulier, H. Pouransari, and E. Darve, The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems, 2015.

T. A. Davis, J. R. Gilbert, S. I. Larimore, and E. G. Ng, A column approximate minimum degree ordering algorithm, ACM Trans. Math. Softw, vol.30, issue.3, pp.353-376, 2004.
DOI : 10.1145/1024074.1024079

T. A. Davis, Algorithm 832: UMFPACK V4.3-an unsymmetric-pattern multifrontal method, ACM Transactions On Mathematical Software, vol.30, pp.196-199, 2004.
DOI : 10.1145/992200.992206

T. A. Davis, Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rankrevealing sparse QR factorization, ACM Trans. Math. Softw, vol.38, issue.1, 2011.

T. A. Davis and Y. Hu, The university of Florida sparse matrix collection, ACM Trans. Math. Softw, vol.38, issue.1, 2011.

J. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics, 1997.
DOI : 10.1137/1.9781611971446

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM J. Sci. Comput, vol.34, issue.1, pp.1064-8275, 2012.
DOI : 10.1137/080731992

URL : https://hal.archives-ouvertes.fr/hal-00870930

E. W. Dijkstra, Een algorithme ter voorkoming van de dodelijke omarming". circulated privately, 1965.

E. W. Dijkstra, Texts and Monographs in Computer Science, The Mathematics Behind the Banker's Algorithm". English. In: Selected Writings on Computing: A personal Perspective, pp.308-312, 1982.

E. D. Dolan and J. J. Moré, Benchmarking Optimization Software with Performance Profiles, Mathematical Programming, vol.91, pp.201-213, 2002.

V. Dolean, P. Jolivet, and F. Nataf, An Introduction to Domain Decomposition Methods, Society for Industrial and Applied Mathematics, 2015.
URL : https://hal.archives-ouvertes.fr/cel-01100932

J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. , Van der Vorst. Numerical Linear Algebra for High-Performance Computers, 1998.

J. Dongarra, M. Faverge, T. Hérault, M. Jacquelin, J. Langou et al., Hierarchical QR factorization algorithms for multi-core clusters, Parallel Computing, vol.39, pp.212-232, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00809770

I. S. Duff, R. Guivarch, D. Ruiz, and M. Zenadi, The Augmented Block Cimmino Distributed Method, SIAM Journal on Scientific Computing, vol.37, 2015.

I. S. Duff and J. K. Reid, The multifrontal solution of indefinite sparse symmetric linear systems, ACM Transactions On Mathematical Software, vol.9, pp.302-325, 1983.

C. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1.3 (Sept. 1936), pp.211-218

S. C. Eisenstat and J. W. Liu, A tree-based dataflow model for the unsymmetric multifrontal method, Electronic Transactions on Numerical Analysis, vol.21, pp.1-19, 2005.

S. C. Eisenstat and J. W. Liu, Algorithmic Aspects of Elimination Trees for Sparse Unsymmetric Matrices, SIAM Journal on Matrix Analysis and Applications, vol.29, pp.1363-1381, 2008.

S. Ellingsrud, T. Eidesmo, S. Johansen, M. C. Sinha, L. M. Macgregor et al., Remote sensing of hydrocarbon layers by seabed logging (SBL): Results from a cruise offshore Angola, The Leading Edge, vol.21, pp.972-982, 2002.

B. Engquist and L. Ying, Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation, Communications on Pure and Applied Mathematics, vol.64, pp.697-735, 2011.

L. Eyraud-dubois, L. Marchal, O. Sinnen, and F. Vivien, Parallel Scheduling of Task Trees with Limited Memory, In: ACM Trans. Parallel Comput, vol.2, issue.2, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160118

A. Geist and E. G. Ng, Task scheduling for parallel sparse Cholesky factorization, Int J. Parallel Programming, vol.18, pp.291-314, 1989.

A. J. George, Nested dissection of a regular finite-element mesh", In: SIAM J. Numer. Anal, vol.10, pp.345-363, 1973.

A. J. George and M. T. Heath, Solution of Sparse Linear Least Squares Problems Using Givens Rotations, Linear Algebra and its Applications, vol.34, pp.69-83, 1980.

A. George, J. Liu, and E. Ng, A Data Structure for Sparse $QR$ and $LU$ Factorizations, In: SIAM Journal on Scientific and Statistical Computing, vol.9, pp.100-121, 1988.

A. George and E. Ng, On the Complexity of Sparse $QR$ and $LU$ Factorization of Finite-Element Matrices, SIAM Journal on Scientific and Statistical Computing, vol.9, pp.849-861, 1988.

P. Ghysels, X. S. Li, F. Rouet, S. Williams, and A. Napov, An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling, SIAM Journal on Scientific Computing, vol.38, pp.358-384, 2016.

J. R. Gilbert and E. G. Ng, Predicting structure in nonsymmetric sparse matrix factorizations, Graph Theory and Sparse Matrix Computations, pp.107-140, 1993.

J. R. Gilbert, X. S. Li, E. G. Ng, and B. W. Peyton, Computing Row and Column Counts for Sparse QR and LU Factorization, BIT Numerical Mathematics, vol.41, pp.693-710, 2001.

, To appear or submitted

J. R. Gilbert and J. W. Liu, Elimination Structures for Unsymmetric Sparse $LU$ Factors, SIAM Journal on Matrix Analysis and Applications, vol.14, pp.334-352, 1993.

A. Gillman, P. Young, and P. Martinsson, A direct solver with O(N) complexity for integral equations on one-dimensional domains, Frontiers of Mathematics in China, vol.7, pp.1673-3452, 2012.

L. Giraud and A. Haidar, Parallel algebraic hybrid solvers for large 3D convectiondiffusion problems, Numerical Algorithms, vol.51, pp.1572-9265, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00441717

W. Givens, Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form, English. In: Journal of the Society for Industrial and Applied Mathematics, vol.6, 1958.

G. Golub and . English, Numerical methods for solving linear least squares problems, Numerische Mathematik, vol.7, pp.206-216, 1965.

G. H. Golub and C. F. Van-loan, Matrix Computations, 2012.

L. Grasedyck, R. Kriemann, and S. Le-borne, Parallel black box H-LU preconditioning for elliptic boundary value problems, Computing and Visualization in Science, vol.11, pp.1433-0369, 2008.

A. Guermouche and J. Y. , Constructing memory-minimizing schedules for multifrontal methods, ACM Transactions on Mathematical Software (TOMS), vol.32, issue.1, pp.17-32, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00358620

A. Guermouche, J. -y.-l'excellent, and G. Utard, Impact of Reordering on the Memory of a Multifrontal Solver, Parallel Computing, vol.29, pp.1191-1218, 2003.
URL : https://hal.archives-ouvertes.fr/hal-00807378

A. Gupta, A Shared-and distributed-memory parallel general sparse direct solver, In: Appl. Algebra Eng. Commun. Comput, vol.18, pp.263-277, 2007.

M. H. Gutknecht, Variants of BICGSTAB for matrices with complex spectrum, SIAM Journal on Scientific Computing, vol.14, pp.1020-1033, 1993.

T. M. Habashy and A. Abubakar, A general framework for constraint minimization for the inversion of electromagnetic measurements, Progress in electromagnetics Research, vol.46, pp.265-312, 2004.

W. Hackbusch, B. N. Khoromskij, and R. Kriemann, Hierarchical Matrices Based on a Weak Admissibility Criterion, English. In: Computing, vol.73, pp.207-243, 2004.

W. Hackbusch, A sparse matrix arithmetic based on H-matrices. Part I: introduction to H-matrices, Computing 62, vol.2, pp.89-108, 1999.

W. Hackbusch, Springer series in computational mathematics, Hierarchical matrices : algorithms and analysis, vol.49, p.511, 2015.

A. Haidar, A. Abdelfattah, M. Zounon, S. Tomov, and J. Dongarra, A Guide For Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations, IEEE Transactions on Parallel and Distributed Systems PP.99, pp.1-1, 2017.

N. Halko, P. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review, vol.53, pp.217-288, 2011.

M. T. Heath, Some Extensions of an Algorithm for Sparse Linear Least Squares Problems, In: SIAM Journal on Scientific and Statistical Computing, vol.3, pp.223-237, 1982.
DOI : 10.1137/0903014

P. Hénon, P. Ramet, and J. Roman, PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems, Parallel Computing, vol.28, issue.2, pp.301-321, 2002.

H. P. Hofstee, Power Efficient Processor Architecture and The Cell Processor, Proceedings of the 11th International Symposium on High-Performance Computer Architecture. HPCA '05, pp.258-262, 2005.
DOI : 10.1109/hpca.2005.26

URL : http://www.hpcaconf.org/hpca11/papers/25_hofstee-cellprocessor_final.pdf

J. Hogg, J. K. Reid, and J. A. Scott, Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM J. Scientific Computing, vol.32, pp.3627-3649, 2010.
DOI : 10.1137/090757216

A. S. Householder, Unitary Triangularization of a Nonsymmetric Matrix, In: J. ACM, vol.5, pp.339-342, 1958.
DOI : 10.1145/320941.320947

URL : https://hal.archives-ouvertes.fr/hal-01316095

M. Jacquelin, L. Marchal, Y. Robert, and B. Uçar, On Optimal Tree Traversals for Sparse Matrix Factorization, Proceedings of 25th International Parallel and Distributed Processing Symposium (IPDPS'11), 2011.
DOI : 10.1109/ipdps.2011.60

URL : https://hal.archives-ouvertes.fr/hal-00945078

P. Jaysaval, D. Shantsev, and S. De-la-kethulle-de-ryhove, Fast multimodel finite-difference controlled-source electromagnetic simulations based on a Schur complement approach, Geophysics 79, vol.6, pp.315-327, 2014.
DOI : 10.1190/geo2014-0043.1

G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM J. Sci. Comput, vol.20, pp.1064-8275, 1998.
DOI : 10.1137/s1064827595287997

URL : http://glaros.dtc.umn.edu/gkhome/fetch/papers/mlSIAMSC99.pdf

K. Key, Marine Electromagnetic Studies of Seafloor Resources and Tectonics, In: Surveys in Geophysics, vol.33, issue.1, pp.1573-0956, 2012.
DOI : 10.1007/s10712-011-9139-x

A. Kleen, An NUMA API for Linux, 2004.

J. Kurzak and J. Dongarra, Fully Dynamic Scheduler for Numerical Computing on Multicore Processors, LAPACK working note lawn220, 2009.

J. Excellent, Multifrontal methods for large sparse systems of linear equations: parallelism, memory usage, performance optimization and numerical issues". Habilitation, 2012.

J. Excellent and M. W. Sid-lakhdar, A study of shared-memory parallelism in a multifrontal solver, Parallel Computing, vol.40, pp.34-46, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01060322

X. Lacoste, Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU cluster systems, 2015.

X. S. Li and J. W. Demmel, Making Sparse Gaussian Elimination Scalable by Static Pivoting, Supercomputing, 1998.SC98. IEEE/ACM Conference on, pp.34-34, 1998.

X. S. Li and J. W. Demmel, SuperLU_DIST: A Scalable Distributed-memory Sparse Direct Solver for Unsymmetric Linear Systems, ACM Trans. Math. Softw, vol.29, issue.2, pp.110-140, 2003.

E. Liberty, F. Woolfe, P. Martinsson, V. Rokhlin, and M. Tygert, Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, vol.104, pp.20167-20172, 2007.

J. W. Liu, A. George, and E. Ng, Communication results for parallel sparse Cholesky factorization on a hypercube, Parallel Computing, vol.10, pp.287-298, 1989.

J. W. Liu, An Application of Generalized Tree Pebbling to Sparse Matrix Factorization, In: SIAM J. Algebraic Discrete Methods, vol.8, issue.3, pp.196-5212, 1987.
DOI : 10.1137/0608031

URL : http://graal.ens-lyon.fr/%7Elmarchal/scheduling/generalized_tree_pebbling_liu.pdf

J. W. Liu, Modification of the Minimum-degree Algorithm by Multiple Elimination, ACM Trans. Math. Softw, vol.11, issue.2, pp.141-153, 1985.
DOI : 10.1145/214392.214398

J. W. Liu, On the storage requirement in the out-of-core multifrontal method for sparse factorization, ACM Transactions On Mathematical Software, vol.12, pp.127-148, 1986.

J. W. Liu, The Multifrontal Method for Sparse Matrix Solution: Theory and Practice, SIAM Review, vol.34, pp.82-109, 1992.

F. Lopez, Task-based multifrontal QR solver for heterogeneous architectures, 2015.
URL : https://hal.archives-ouvertes.fr/tel-01386600

R. Luce and E. G. Ng, On the Minimum FLOPs Problem in the Sparse Cholesky Factorization, SIAM Journal on Matrix Analysis and Applications, vol.35, pp.1-21, 2014.

K. Marfurt, Accuracy of finite-difference and finite-element modeling of the scalar and elastic wave equations, Geophysics 49, pp.533-549, 1984.

P. G. Martinsson, A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix, In: SIAM Journal on Matrix Analysis and Applications, vol.32, pp.1251-1274, 2011.

P. G. Martinsson, Compressing Rank-Structured Matrices via Randomized Sampling, In: SIAM Journal on Scientific Computing, vol.38, 2016.

T. Mary, Block Low-Rank multifrontal solvers: complexity, performance, and scalability, 2017.
URL : https://hal.archives-ouvertes.fr/tel-01929478

P. Matstoms, Parallel Sparse QR factorization on shared memory architectures. Tech. rep. LiTH-MAT-R-1993-18. Department of Mathematics, 1993.

J. D. Mccalpin, STREAM: Sustainable Memory Bandwidth in High Performance Computers, 1991.

P. J. Mucci, S. Browne, C. Deane, and G. Ho, PAPI: A Portable Interface to Hardware Performance Counters, Proceedings of Department of Defense HPCMP Users Group Conference, 1999.

W. Mulder, A multigrid solver for 3D electromagnetic diffusion, Geophysical prospecting, vol.54, pp.633-649, 2006.

S. Operto, A. Miniussi, R. Brossier, L. Combe, L. Métivier et al., Efficient 3-D frequency-domain mono-parameter fullwaveform inversion of ocean-bottom cable data: application to Valhall in the viscoacoustic vertical transverse isotropic approximation, In: Geophysical Journal International, vol.202, pp.1362-1391, 2015.
URL : https://hal.archives-ouvertes.fr/hal-02009486

S. Operto, J. Virieux, P. R. Amestoy, J. -y.-l'excellent, L. Giraud et al., 3D finite-difference frequency-domain modeling of visco-acoustic wave propagation using a massively parallel direct solver: A feasibility study, Geophysics, vol.72, pp.195-211, 2007.
URL : https://hal.archives-ouvertes.fr/insu-00355256

J. Peiró and S. Sherwin, Finite difference, finite element and finite volume methods for partial differential equations, Handbook of materials modeling, pp.2415-2446, 2005.

F. Pellegrini and J. Roman, Scotch: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs, Proceedings of HPCN'96, Brussels, LNCS 1067, pp.493-498, 1996.

G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman, Sparse Supernodal Solver Using Block Low-Rank Compression, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.1138-1147, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01502215

D. J. Pierce and J. G. Lewis, Sparse Multifrontal Rank Revealing QR Factorization, In: SIAM J. Matrix Anal. Appl. 18, vol.1, pp.895-4798, 1997.

A. Pothen and C. Sun, A mapping algorithm for parallel sparse Cholesky factorization, SIAM Journal on Scientific Computing, vol.14, pp.1253-1253, 1993.

S. G. Prasanna and B. R. Musicus, Generalized multiprocessor scheduling and applications to matrix computations, IEEE Transactions on Parallel and Distributed Systems, vol.7, pp.650-664, 1996.

J. Rigal and J. Gaches, On the compatibility of a given solution with the data of a linear system, J. Assoc. Comput. Mach, vol.14, pp.526-543, 1967.

D. J. Rose, R. E. Tarjan, and G. S. Lueker, Algorithmic Aspects of Vertex Elimination on Graphs, SIAM Journal on Computing, vol.5, pp.266-283, 1976.
DOI : 10.1137/0205021

F. Rouet, Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides". anglais, 2012.
URL : https://hal.archives-ouvertes.fr/tel-00785748

O. Schenk, K. Gärtner, and W. Fichtner, Efficient Sparse LU Factorization with Left-Right Looking Strategy on Shared Memory Multiprocessors, BIT Numerical Mathematics, vol.40, pp.158-176, 2000.
DOI : 10.1007/bfb0100583

URL : http://www.iis.ee.ethz.ch/~oschenk/papers/oschenk-hpcn-procee-1999.ps.gz

E. Schmidt-;-german, Über die Auflösung linearer Gleichungen mit Unendlich vielen unbekannten, pp.53-77, 1908.

R. Schreiber and C. Van-loan, A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Stat. Comput, vol.10, pp.52-57, 1989.

R. Schreiber, A new implementation of sparse Gaussian elimination, ACM Transactions On Mathematical Software, vol.8, pp.256-276, 1982.

M. Sergent, D. Goudin, S. Thibault, and O. Aumage, Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.318-327, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01284004

R. Sethi, Complete Register Allocation Problems, Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. STOC '73, pp.182-195, 1973.
DOI : 10.1145/800125.804049

URL : http://graal.ens-lyon.fr/%7Elmarchal/scheduling/sethi_complete_register_allocation.pdf

R. Sethi and J. D. Ullman, The Generation of Optimal Code for Arithmetic Expressions, J. ACM, vol.17, pp.4-5411, 1970.

W. M. Sid-lakhdar, Scaling multifrontal methods for the solution of large sparse linear systems on hybrid shared-distributed memory architectures, 2014.

T. Slavova, Parallel triangular solution in the out-of-core multifrontal approach for solving large sparse linear systems, 2009.

E. Solomonik and J. Demmel, Communication-optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms, Proceedings of the 17th International Conference on Parallel Processing-Volume Part II. Euro-Par'11, pp.978-981, 2011.

A. Tarantola, Inversion of seismic reflection data in the acoustic approximation, Geophysics 49, vol.8, pp.1259-1266, 1984.

A. N. Tikhonov, Regularization of incorrectly posed problems, Soviet Math. Dokl, vol.4, pp.1624-1627, 1963.

H. Topcuouglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, pp.260-274, 2002.

L. N. Trefethen and D. Bau, Numerical Linear Algebra. SIAM, p.898713617, 1997.

H. A. Van and . Vorst, Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM Journal on scientific and Statistical Computing, vol.13, pp.631-644, 1992.

J. Virieux and S. Operto, An overview of full waveform inversion in exploration geophysics, Geophysics 74, vol.6, pp.1-26, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00457989

S. Wang, X. S. Li, F. Rouet, J. Xia, and M. V. De-hoop, A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, ACM Transactions on Mathematical Software, vol.42, issue.3, 2016.

S. Wang, X. S. Li, F. Rouet, J. Xia, and M. Van-de-hoop, A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, In: Submitted to ACM Trans. Math. Softw, 2013.
DOI : 10.1145/2830569

C. Weisbecker, Improving multifrontal solvers by means of algebraic Block LowRank representations, 2013.
URL : https://hal.archives-ouvertes.fr/tel-00934939

J. H. Wilkinson, Rounding Errors in Algebraic Processes, 1963.

S. Williams, A. Waterman, and D. Patterson, Roofline: An Insightful Visual Performance Model for Multicore Architectures, Commun. ACM, vol.52, pp.65-76, 2009.
DOI : 10.2172/1407078

URL : http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-134.pdf

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG scheduling for Hybrid Distributed Systems, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp.156-165, 2015.
DOI : 10.1109/ipdps.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01078359

, To appear or submitted

J. Xia, Efficient Structured Multifrontal Factorization for General Large Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, 2013.
DOI : 10.1137/120867032

J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, Superfast Multifrontal Method for Large Structured Linear Systems of Equations, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.1382-1411, 2009.

M. Yannakakis, Computing the Minimum Fill, NP-Complete". In: SIAM Journal on Algebraic Discrete Methods, vol.2, issue.1, pp.77-79, 1981.

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011.

, GHz and are equipped with Intel AVX SIMD units; the peak performance is of

, Gflop/s per core and thus 691.2 Gflop/s per node for real

?. Sirocco, Each node is equipped with two Haswell Intel Xeon E5-2680 (twelve cores) processors and 124 GB of memory per node. The cores are clocked at 2.5 GHz and are equipped with Intel AVX SIMD units. In addition, each node is accelerated with four Nvidia K40M GPUs; the peak performance is of 40, This is a five nodes cluster part of the PlaFRIM center

, ? brunch: a shared-memory machine installed at the LIP laboratory of ENS-Lyon equipped with 1.5 TB of memory and four Intel 24-cores Broadwell E7-8890v4 processors running at a frequency varying between 2

, Each of its 612 nodes is equipped with 64 GB of memory and two Intel 10-cores Ivy Bridge processors running at 2.8 GHz. The nodes are interconnected with an Infiniband FDR network, ? eos: the supercomputer of the Calcul en Midi-Pyrénées (CALMIP) center (grant P0989, 2008.

, Each of its 102 nodes is equipped with 64 GB of memory and two Intel 10-cores Ivy Bridge processors running at 2.5 GHz. The nodes are interconnected with Infiniband FDR, ? licallo: the supercomputer of the SIGAMM mesocenter in Observatoire de la Côte

, ? farad: a shared-memory machine equipped with 264 GB of memory and two Intel 16-cores Sandy Bridge processors running at 2.9 GHz