Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations, SIAM Journal on Scientific Computing, vol.38, pp.256-279, 2016. ,
DOI : 10.1137/130938505
URL : https://hal.archives-ouvertes.fr/hal-00726644
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Trans. Math. Softw, vol.43, 2016. ,
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645
Fast 3D frequency-domain full waveform inversion with a parallel Block Low-Rank multifrontal direct solver: application to OBC data from the North Sea, Geophysics 81, vol.6, pp.363-383, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01349119
Improving Multifrontal Methods by Means of Block Low-Rank Representations, In: SIAM Journal on Scientific Computing, vol.37, 2015. ,
DOI : 10.1137/120903476
URL : https://hal.archives-ouvertes.fr/hal-00776859
Shared-Memory Parallelism and Low-Rank Approximation Techniques Applied to Direct Solvers in FEM Simulation, IEEE Transactions on Magnetics, vol.50, issue.2, pp.517-520, 2014. ,
DOI : 10.1109/tmag.2013.2284024
URL : https://hal.archives-ouvertes.fr/hal-01123557
On the Complexity of the Block Low-Rank Multifrontal Factorization, SIAM Journal on Scientific Computing, vol.39, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01322230
Accelerating scientific computations with mixed precision algorithms, Computer Physics Communications, vol.180, pp.2526-2533, 2009. ,
DOI : 10.1016/j.cpc.2008.11.005
URL : http://arxiv.org/pdf/0808.2794
Simultaneous analysis of large INTEGRAL/SPI datasets: optimizing the computation of the solution and its variance using sparse matrix algorithms, Astronomy & Astrophysics A52, vol.1, pp.59-69, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01125193
Fine-Grained Multithreading for the Multifrontal QR Factorization of Sparse Matrices, In: SIAM Journal on Scientific Computing, vol.35, pp.323-345, 2013. ,
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471
2LEV-D2P4: a package of high-performance preconditioners for scientific and engineering applications, In: Appl. Algebra Eng., Commun. Comput, vol.18, pp.938-1279, 2007. ,
DOI : 10.1007/s00200-007-0035-z
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy, ACM Trans. Math. Softw, vol.34, pp.1-22, 2008. ,
DOI : 10.1145/1377596.1377597
URL : http://www.netlib.org/netlib/utk/people/JackDongarra/PAPERS/iterative-refine-toms-2007.pdf
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems, Int. J. High Perform. Comput. Appl, vol.21, pp.457-466, 2007. ,
Performance Optimization and Modeling of Blocked Sparse Kernels, Int. J. High Perform. Comput. Appl, vol.21, pp.467-484, 2007. ,
DOI : 10.1177/1094342007083801
URL : http://hpc.sagepub.com/cgi/reprint/21/4/467.pdf
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Comput, vol.35, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Parallel tiled QR factorization for multicore architectures, Concurr. Comput. : Pract. Exper, vol.20, pp.1532-0626, 2008. ,
DOI : 10.1007/978-3-540-68111-3_67
URL : http://www.netlib.org/lapack/lawnspdf/lawn190.pdf
Object-Oriented Techniques for Sparse Matrix Computations in Fortran, ACM Transactions on Mathematical Software, vol.38, p.20, 2003. ,
DOI : 10.1145/2331130.2331131
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization, IEEE Trans. Parallel Distrib. Syst, vol.19, pp.1045-9219, 2008. ,
The PlayStation 3 for High-Performance Scientific Computing, Computing in Science and Eng, vol.10, pp.1521-9615, 2008. ,
DOI : 10.1109/mcse.2008.85
URL : http://www.cs.utk.edu/~library/TechReports/2008/ut-cs-08-608.pdf
,
Large-scale 3D EM modeling with a Block Low-Rank multifrontal direct solver, Geophysical Journal International, 2017. ,
Recent advances in sparse direct solvers, Conference on Structural Mechanicsin Reactor Technology, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01060301
Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver, Euro-Par 2016: Parallel Processing Workshops: Euro-Par 2016 International Workshops, pp.175-186, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01337748
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
, Parallel Processing, pp.521-532, 2013.
Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, In: HiPC. IEEE Computer Society, pp.978-979, 2015. ,
DOI : 10.1109/hipc.2015.27
URL : https://hal.archives-ouvertes.fr/hal-01270145
3D frequency-domain seismic modeling with a Parallel BLR multifrontal direct solver, SEG Technical Program Expanded Abstracts 2015. 2015. Chap. 692, pp.3606-3611 ,
DOI : 10.1190/segam2015-5811693.1
URL : https://hal.archives-ouvertes.fr/hal-01237869
Efficient 3D frequency-domain full-waveform inversion of ocean-bottom cable data with sparse block low-rank direct solver: a real data case study from the North Sea, SEG Technical Program Expanded Abstracts, vol.251, pp.1303-1308, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01239896
Shared memory parallelism and low-rank approximation techniques applied to direct solvers in FEM simulation, IEEE International Conference on the Computation of Electromagnetic Fields (COMPUMAG), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-01123557
Towards exascale with the ANR-JST Japanese-French Project FP3C, Ninth International Conference on Computer Science and Information Technologies Revised Selected Papers, pp.1-10, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00922754
FAST-EVP: an Engine Simulation Tool, High Perfromance Computing and Communications. First International Conference, HPCC 2005, Proceedings, vol.3726 ,
INTEGRAL/SPI data segmentation to retrieve sources intensity variations (regular paper), An INTEGRAL view of the high-energy sky (the first 10 years), 2013. ,
Fine granularity sparse QR factorization for multicore based systems, Proceedings of the 10th international conference on Applied Parallel and Scientific Computing, vol.2, pp.226-236, 2012. ,
Extending PSBLAS to Build Parallel Schwarz Preconditioners, Applied Parallel Computing. State of the Art in Scientific Computing: 7th International Conference, vol.3732, pp.593-602, 2004. ,
Multithreading for Synchronization Tolerance in Matrix Factorization, Proceedings of the SciDAC 2007 Conference, 2007. ,
The impact of multicore on math software, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing. PARA'06, pp.1-10, 2007. ,
Parallel tiled QR factorization for multicore architectures, PPAM'07: Proceedings of the 7th international conference on Parallel processing and applied mathematics, pp.639-648, 2008. ,
Prospectus for the Next LAPACK and ScaLAPACK Libraries, PARA'06: State-of-the-Art in Scientific and Parallel Computing. High Performance Computing Center North (HPC2N) and the Department of Computing Science, 2006. ,
Pre-exascale Architectures: OpenPOWER Performance and Usability Assessment for French Scientific Community, High Performance Computing: ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P3MA, VHPC, Visualization at, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02129651
, , pp.309-324, 2017.
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p.113, 2006. ,
Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers, Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on, pp.481-490, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01180272
3D frequency-domain seismic modeling with a block low-rank algebraic multifrontal direct solver, SEG Technical Program Expanded Abstracts, vol.662, pp.3411-3416, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00924638
, MUMPS". In: Encyclopedia of Parallel Computing, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00787042
The Multifrontal Method, Encyclopedia of Parallel Computing, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00787015
Parallel Dense Linear Algebra Software in the Multicore Era, Cyberinfrastructure Technologies and Applications, 2007. ,
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations, High Performance Computing and Grids in Action, 2007. ,
Prospectus for a Linear Algebra Software Library for Dense Matrix Problems, Handbook of Parallel Computing: Models, Algorithms and Applications, vol.17, p.9781584886235, 2007. ,
Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures, Research Report. submitted to ACM TOMS. INPT-IRIT ,
URL : https://hal.archives-ouvertes.fr/hal-01505070
, , 2017.
Bridging the gap between flat and hierarchical low-rank matrix formats: the multilevel BLR format, Tech. rep. Submitted to the SIAM Journal on Scientific Computing. IRIT, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01774642
Approximating minimum norm solutions of rank-deficient least squares problems, Numerical Linear Algebra with Applications, vol.5, pp.79-99, 1998. ,
On the out-of-core factorization of large sparse matrices, 2008. ,
URL : https://hal.archives-ouvertes.fr/tel-00563463
Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model, p.27, 2016. ,
DOI : 10.1109/tpds.2017.2766064
URL : https://hal.archives-ouvertes.fr/hal-01618526
Taskbased FMM for heterogeneous architectures, p.29, 2014. ,
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-00974674
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, p.12037, 2009. ,
Fully Empirical Autotuned QR Factorization For Multicore Architectures, 2011. ,
DOI : 10.1007/978-3-642-23397-5_19
URL : https://hal.archives-ouvertes.fr/hal-00726654
Hierarchical hybrid sparse linear solver for multicore platforms, p.25, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01379227
Reducing the I/O Volume in Sparse Out-of-core Multifrontal Methods, SIAM Journal on Scientific Computing, vol.31, pp.4774-4794, 2010. ,
Exploiting Data Sparsity for Large-Scale Matrix Computations, 2018. ,
DOI : 10.1007/978-3-319-96983-1_51
URL : https://repository.kaust.edu.sa/bitstream/10754/627403/1/hicma_tech.pdf
Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002. ,
An Approximate Minimum Degree Ordering Algorithm, In: SIAM J. Matrix Anal. Appl, vol.17, pp.886-905, 1996. ,
DOI : 10.1137/s0895479894278952
Memory Management Issues in Sparse Multifrontal Methods On Multiprocessors, The International Journal of Supercomputing Applications, vol.7, issue.1, pp.64-82, 1993. ,
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling, SIAM Journal on Matrix Analysis and Applications, vol.23, pp.15-41, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-00808293
On computing inverse entries of a sparse matrix in an out-of-core environment, SIAM Journal on Scientific Computing, vol.34, pp.1975-1999, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00763556
Multifrontal QR factorization in a multiprocessor environment, Int. Journal of Num. Linear Alg. and Appl, vol.3, issue.4, pp.275-300, 1996. ,
Hybrid scheduling for the parallel solution of linear systems, Parallel Computing, vol.32, issue.2, pp.136-156, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00070599
An unsymmetrized multifrontal LU factorization, SIAM Journal on Matrix Analysis and Applications, vol.24, pp.553-569, 2002. ,
DOI : 10.2172/776628
URL : https://digital.library.unt.edu/ark:/67531/metadc715385/m2/1/high_res_d/776628.pdf
On Exploiting Sparsity of Multiple Right-Hand Sides in Sparse Direct Solvers, pp.1-28, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01955659
A fast block low-rank dense solver with applications to finite-element matrices, Journal of Computational Physics, vol.304, pp.170-188, 2016. ,
, LAPACK Users' Guide. Third. Philadelphia, PA: Society for Industrial and Applied Mathematics, pp.0-89871, 1999.
A Block Low-Rank multithreaded factorization for dense BEM operators, SIAM Conference on Parallel Processing (SIAM PP16), 2016. ,
Thread Scheduling for Multiprogrammed Multiprocessors, Theory Comput. Syst, vol.34, pp.115-144, 2001. ,
The Landscape of Parallel Computing Research: A View from Berkeley, TECHNICAL REPORT, 2006. ,
The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms, Graph Theory and Sparse Matrix Computation, pp.978-979, 1993. ,
SPOOLES: An object oriented sparse matrix library, Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Using Perturbed QR Factorizations to Solve Linear Least-Squares Problems, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.674-693, 2009. ,
Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, vol.21, pp.2438-2456, 2009. ,
Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numerica 23, pp.1-155, 2014. ,
Task scheduling for parallel multifrontal methods, Euro-Par 2007 Parallel Processing, pp.758-766, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00358626
Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems (Lecture Notes in Computational Science and Engineering), p.3540771468, 2008. ,
Approximation of boundary element matrices, Numerische Mathematik, vol.86, pp.565-589, 2000. ,
Efficient inversion of Galerkin matrices of general second-order elliptic differential operators with nonsmooth coefficients, Mathematics of Computation, vol.74, pp.1179-1199, 2005. ,
Why finite element discretizations can be factored by triangular hierarchical matrices, SIAM Journal on Numerical Analysis, vol.45, p.1472, 2007. ,
Existence of H-matrix approximants to the inverse FE-matrix of elliptic operators with L ?-coefficients, Numerische Mathematik, vol.95, pp.1-28, 2003. ,
, Numerical methods for Least Squares Problems. Philadelphia: SIAM, 1996.
Introduction to hierarchical matrices with applications, Engineering analysis with boundary elements, vol.27, pp.405-422, 2003. ,
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science and Engineering, vol.15, pp.36-45, 2013. ,
Tiled QR Factorization Algorithms, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC '11, vol.7, 2011. ,
DOI : 10.1145/2063384.2063393
URL : https://hal.archives-ouvertes.fr/hal-00945074
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Application, Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference, pp.180-186, 2010. ,
DOI : 10.1109/pdp.2010.67
URL : https://hal.inria.fr/file/index/docid/429889/filename/main.pdf
Efficient Batch LU and QR Decomposition on GPU, Numerical Computations with GPUs, pp.978-981, 2014. ,
DOI : 10.1007/978-3-319-06548-9_4
, Combinatorial Matrix Theory. Encyclopedia of Mathematics and its Applications, 1991.
DOI : 10.1201/b16113-44
Strong Hall Matrices, In: SIAM J. Matrix Anal. Appl, vol.15, issue.2, pp.359-365, 1994. ,
DOI : 10.1137/s0895479892225142
Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems, Mathematics of Computation, vol.31, pp.162-179, 1977. ,
DOI : 10.2307/2005787
URL : https://www.ams.org/mcom/1977-31-137/S0025-5718-1977-0428694-0/S0025-5718-1977-0428694-0.pdf
Linear Least Squares Solutions by Householder Transformations, Numer. Math, vol.7, issue.3, pp.269-276, 1965. ,
DOI : 10.1007/bf01436084
On the Numerical Rank of the Off-Diagonal Blocks of Schur Complements of Discretized Elliptic PDEs, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.2261-2290, 2010. ,
A fast ULV decomposition solver for hierarchically semiseparable representations, SIAM Journal on Matrix Analysis and Applications, vol.28, issue.3, pp.603-622, 2006. ,
DOI : 10.1137/s0895479803436652
Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate, ACM Trans. Math. Softw, vol.35, issue.3, 2008. ,
On the Compression of Low Rank Matrices, SIAM Journal on Scientific Computing, vol.26, pp.1389-1404, 2005. ,
Predicting Fill for Sparse Orthogonal Factorization, J. ACM, vol.33, issue.3, pp.517-532, 1986. ,
DOI : 10.1145/5925.5932
URL : https://ecommons.cornell.edu/bitstream/1813/6418/1/83-578.pdf
Ten years of marine CSEM for hydrocarbon exploration, Geophysics, vol.75, pp.75-67, 2010. ,
DOI : 10.1190/1.3483451
The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems, 2015. ,
A column approximate minimum degree ordering algorithm, ACM Trans. Math. Softw, vol.30, issue.3, pp.353-376, 2004. ,
DOI : 10.1145/1024074.1024079
Algorithm 832: UMFPACK V4.3-an unsymmetric-pattern multifrontal method, ACM Transactions On Mathematical Software, vol.30, pp.196-199, 2004. ,
DOI : 10.1145/992200.992206
Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rankrevealing sparse QR factorization, ACM Trans. Math. Softw, vol.38, issue.1, 2011. ,
The university of Florida sparse matrix collection, ACM Trans. Math. Softw, vol.38, issue.1, 2011. ,
Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics, 1997. ,
DOI : 10.1137/1.9781611971446
Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM J. Sci. Comput, vol.34, issue.1, pp.1064-8275, 2012. ,
DOI : 10.1137/080731992
URL : https://hal.archives-ouvertes.fr/hal-00870930
Een algorithme ter voorkoming van de dodelijke omarming". circulated privately, 1965. ,
Texts and Monographs in Computer Science, The Mathematics Behind the Banker's Algorithm". English. In: Selected Writings on Computing: A personal Perspective, pp.308-312, 1982. ,
Benchmarking Optimization Software with Performance Profiles, Mathematical Programming, vol.91, pp.201-213, 2002. ,
An Introduction to Domain Decomposition Methods, Society for Industrial and Applied Mathematics, 2015. ,
URL : https://hal.archives-ouvertes.fr/cel-01100932
Van der Vorst. Numerical Linear Algebra for High-Performance Computers, 1998. ,
Hierarchical QR factorization algorithms for multi-core clusters, Parallel Computing, vol.39, pp.212-232, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00809770
The Augmented Block Cimmino Distributed Method, SIAM Journal on Scientific Computing, vol.37, 2015. ,
The multifrontal solution of indefinite sparse symmetric linear systems, ACM Transactions On Mathematical Software, vol.9, pp.302-325, 1983. ,
The approximation of one matrix by another of lower rank, Psychometrika 1.3 (Sept. 1936), pp.211-218 ,
A tree-based dataflow model for the unsymmetric multifrontal method, Electronic Transactions on Numerical Analysis, vol.21, pp.1-19, 2005. ,
Algorithmic Aspects of Elimination Trees for Sparse Unsymmetric Matrices, SIAM Journal on Matrix Analysis and Applications, vol.29, pp.1363-1381, 2008. ,
Remote sensing of hydrocarbon layers by seabed logging (SBL): Results from a cruise offshore Angola, The Leading Edge, vol.21, pp.972-982, 2002. ,
Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation, Communications on Pure and Applied Mathematics, vol.64, pp.697-735, 2011. ,
Parallel Scheduling of Task Trees with Limited Memory, In: ACM Trans. Parallel Comput, vol.2, issue.2, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01160118
Task scheduling for parallel sparse Cholesky factorization, Int J. Parallel Programming, vol.18, pp.291-314, 1989. ,
Nested dissection of a regular finite-element mesh", In: SIAM J. Numer. Anal, vol.10, pp.345-363, 1973. ,
Solution of Sparse Linear Least Squares Problems Using Givens Rotations, Linear Algebra and its Applications, vol.34, pp.69-83, 1980. ,
A Data Structure for Sparse $QR$ and $LU$ Factorizations, In: SIAM Journal on Scientific and Statistical Computing, vol.9, pp.100-121, 1988. ,
On the Complexity of Sparse $QR$ and $LU$ Factorization of Finite-Element Matrices, SIAM Journal on Scientific and Statistical Computing, vol.9, pp.849-861, 1988. ,
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling, SIAM Journal on Scientific Computing, vol.38, pp.358-384, 2016. ,
Predicting structure in nonsymmetric sparse matrix factorizations, Graph Theory and Sparse Matrix Computations, pp.107-140, 1993. ,
Computing Row and Column Counts for Sparse QR and LU Factorization, BIT Numerical Mathematics, vol.41, pp.693-710, 2001. ,
, To appear or submitted
Elimination Structures for Unsymmetric Sparse $LU$ Factors, SIAM Journal on Matrix Analysis and Applications, vol.14, pp.334-352, 1993. ,
A direct solver with O(N) complexity for integral equations on one-dimensional domains, Frontiers of Mathematics in China, vol.7, pp.1673-3452, 2012. ,
Parallel algebraic hybrid solvers for large 3D convectiondiffusion problems, Numerical Algorithms, vol.51, pp.1572-9265, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00441717
Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form, English. In: Journal of the Society for Industrial and Applied Mathematics, vol.6, 1958. ,
Numerical methods for solving linear least squares problems, Numerische Mathematik, vol.7, pp.206-216, 1965. ,
, Matrix Computations, 2012.
Parallel black box H-LU preconditioning for elliptic boundary value problems, Computing and Visualization in Science, vol.11, pp.1433-0369, 2008. ,
Constructing memory-minimizing schedules for multifrontal methods, ACM Transactions on Mathematical Software (TOMS), vol.32, issue.1, pp.17-32, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00358620
Impact of Reordering on the Memory of a Multifrontal Solver, Parallel Computing, vol.29, pp.1191-1218, 2003. ,
URL : https://hal.archives-ouvertes.fr/hal-00807378
A Shared-and distributed-memory parallel general sparse direct solver, In: Appl. Algebra Eng. Commun. Comput, vol.18, pp.263-277, 2007. ,
Variants of BICGSTAB for matrices with complex spectrum, SIAM Journal on Scientific Computing, vol.14, pp.1020-1033, 1993. ,
A general framework for constraint minimization for the inversion of electromagnetic measurements, Progress in electromagnetics Research, vol.46, pp.265-312, 2004. ,
Hierarchical Matrices Based on a Weak Admissibility Criterion, English. In: Computing, vol.73, pp.207-243, 2004. ,
A sparse matrix arithmetic based on H-matrices. Part I: introduction to H-matrices, Computing 62, vol.2, pp.89-108, 1999. ,
Springer series in computational mathematics, Hierarchical matrices : algorithms and analysis, vol.49, p.511, 2015. ,
A Guide For Achieving High Performance With Very Small Matrices On GPU: A case Study of Batched LU and Cholesky Factorizations, IEEE Transactions on Parallel and Distributed Systems PP.99, pp.1-1, 2017. ,
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Review, vol.53, pp.217-288, 2011. ,
Some Extensions of an Algorithm for Sparse Linear Least Squares Problems, In: SIAM Journal on Scientific and Statistical Computing, vol.3, pp.223-237, 1982. ,
DOI : 10.1137/0903014
PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems, Parallel Computing, vol.28, issue.2, pp.301-321, 2002. ,
Power Efficient Processor Architecture and The Cell Processor, Proceedings of the 11th International Symposium on High-Performance Computer Architecture. HPCA '05, pp.258-262, 2005. ,
DOI : 10.1109/hpca.2005.26
URL : http://www.hpcaconf.org/hpca11/papers/25_hofstee-cellprocessor_final.pdf
Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM J. Scientific Computing, vol.32, pp.3627-3649, 2010. ,
DOI : 10.1137/090757216
Unitary Triangularization of a Nonsymmetric Matrix, In: J. ACM, vol.5, pp.339-342, 1958. ,
DOI : 10.1145/320941.320947
URL : https://hal.archives-ouvertes.fr/hal-01316095
On Optimal Tree Traversals for Sparse Matrix Factorization, Proceedings of 25th International Parallel and Distributed Processing Symposium (IPDPS'11), 2011. ,
DOI : 10.1109/ipdps.2011.60
URL : https://hal.archives-ouvertes.fr/hal-00945078
Fast multimodel finite-difference controlled-source electromagnetic simulations based on a Schur complement approach, Geophysics 79, vol.6, pp.315-327, 2014. ,
DOI : 10.1190/geo2014-0043.1
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM J. Sci. Comput, vol.20, pp.1064-8275, 1998. ,
DOI : 10.1137/s1064827595287997
URL : http://glaros.dtc.umn.edu/gkhome/fetch/papers/mlSIAMSC99.pdf
Marine Electromagnetic Studies of Seafloor Resources and Tectonics, In: Surveys in Geophysics, vol.33, issue.1, pp.1573-0956, 2012. ,
DOI : 10.1007/s10712-011-9139-x
An NUMA API for Linux, 2004. ,
Fully Dynamic Scheduler for Numerical Computing on Multicore Processors, LAPACK working note lawn220, 2009. ,
Multifrontal methods for large sparse systems of linear equations: parallelism, memory usage, performance optimization and numerical issues". Habilitation, 2012. ,
A study of shared-memory parallelism in a multifrontal solver, Parallel Computing, vol.40, pp.34-46, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01060322
Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU cluster systems, 2015. ,
Making Sparse Gaussian Elimination Scalable by Static Pivoting, Supercomputing, 1998.SC98. IEEE/ACM Conference on, pp.34-34, 1998. ,
SuperLU_DIST: A Scalable Distributed-memory Sparse Direct Solver for Unsymmetric Linear Systems, ACM Trans. Math. Softw, vol.29, issue.2, pp.110-140, 2003. ,
Randomized algorithms for the low-rank approximation of matrices, Proceedings of the National Academy of Sciences, vol.104, pp.20167-20172, 2007. ,
Communication results for parallel sparse Cholesky factorization on a hypercube, Parallel Computing, vol.10, pp.287-298, 1989. ,
An Application of Generalized Tree Pebbling to Sparse Matrix Factorization, In: SIAM J. Algebraic Discrete Methods, vol.8, issue.3, pp.196-5212, 1987. ,
DOI : 10.1137/0608031
URL : http://graal.ens-lyon.fr/%7Elmarchal/scheduling/generalized_tree_pebbling_liu.pdf
Modification of the Minimum-degree Algorithm by Multiple Elimination, ACM Trans. Math. Softw, vol.11, issue.2, pp.141-153, 1985. ,
DOI : 10.1145/214392.214398
On the storage requirement in the out-of-core multifrontal method for sparse factorization, ACM Transactions On Mathematical Software, vol.12, pp.127-148, 1986. ,
The Multifrontal Method for Sparse Matrix Solution: Theory and Practice, SIAM Review, vol.34, pp.82-109, 1992. ,
Task-based multifrontal QR solver for heterogeneous architectures, 2015. ,
URL : https://hal.archives-ouvertes.fr/tel-01386600
On the Minimum FLOPs Problem in the Sparse Cholesky Factorization, SIAM Journal on Matrix Analysis and Applications, vol.35, pp.1-21, 2014. ,
Accuracy of finite-difference and finite-element modeling of the scalar and elastic wave equations, Geophysics 49, pp.533-549, 1984. ,
A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix, In: SIAM Journal on Matrix Analysis and Applications, vol.32, pp.1251-1274, 2011. ,
Compressing Rank-Structured Matrices via Randomized Sampling, In: SIAM Journal on Scientific Computing, vol.38, 2016. ,
Block Low-Rank multifrontal solvers: complexity, performance, and scalability, 2017. ,
URL : https://hal.archives-ouvertes.fr/tel-01929478
, Parallel Sparse QR factorization on shared memory architectures. Tech. rep. LiTH-MAT-R-1993-18. Department of Mathematics, 1993.
STREAM: Sustainable Memory Bandwidth in High Performance Computers, 1991. ,
PAPI: A Portable Interface to Hardware Performance Counters, Proceedings of Department of Defense HPCMP Users Group Conference, 1999. ,
A multigrid solver for 3D electromagnetic diffusion, Geophysical prospecting, vol.54, pp.633-649, 2006. ,
Efficient 3-D frequency-domain mono-parameter fullwaveform inversion of ocean-bottom cable data: application to Valhall in the viscoacoustic vertical transverse isotropic approximation, In: Geophysical Journal International, vol.202, pp.1362-1391, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-02009486
3D finite-difference frequency-domain modeling of visco-acoustic wave propagation using a massively parallel direct solver: A feasibility study, Geophysics, vol.72, pp.195-211, 2007. ,
URL : https://hal.archives-ouvertes.fr/insu-00355256
Finite difference, finite element and finite volume methods for partial differential equations, Handbook of materials modeling, pp.2415-2446, 2005. ,
Scotch: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs, Proceedings of HPCN'96, Brussels, LNCS 1067, pp.493-498, 1996. ,
Sparse Supernodal Solver Using Block Low-Rank Compression, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.1138-1147, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01502215
Sparse Multifrontal Rank Revealing QR Factorization, In: SIAM J. Matrix Anal. Appl. 18, vol.1, pp.895-4798, 1997. ,
A mapping algorithm for parallel sparse Cholesky factorization, SIAM Journal on Scientific Computing, vol.14, pp.1253-1253, 1993. ,
Generalized multiprocessor scheduling and applications to matrix computations, IEEE Transactions on Parallel and Distributed Systems, vol.7, pp.650-664, 1996. ,
On the compatibility of a given solution with the data of a linear system, J. Assoc. Comput. Mach, vol.14, pp.526-543, 1967. ,
Algorithmic Aspects of Vertex Elimination on Graphs, SIAM Journal on Computing, vol.5, pp.266-283, 1976. ,
DOI : 10.1137/0205021
Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides". anglais, 2012. ,
URL : https://hal.archives-ouvertes.fr/tel-00785748
Efficient Sparse LU Factorization with Left-Right Looking Strategy on Shared Memory Multiprocessors, BIT Numerical Mathematics, vol.40, pp.158-176, 2000. ,
DOI : 10.1007/bfb0100583
URL : http://www.iis.ee.ethz.ch/~oschenk/papers/oschenk-hpcn-procee-1999.ps.gz
Über die Auflösung linearer Gleichungen mit Unendlich vielen unbekannten, pp.53-77, 1908. ,
A storage-efficient WY representation for products of Householder transformations, SIAM J. Sci. Stat. Comput, vol.10, pp.52-57, 1989. ,
A new implementation of sparse Gaussian elimination, ACM Transactions On Mathematical Software, vol.8, pp.256-276, 1982. ,
Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp.318-327, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01284004
Complete Register Allocation Problems, Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. STOC '73, pp.182-195, 1973. ,
DOI : 10.1145/800125.804049
URL : http://graal.ens-lyon.fr/%7Elmarchal/scheduling/sethi_complete_register_allocation.pdf
The Generation of Optimal Code for Arithmetic Expressions, J. ACM, vol.17, pp.4-5411, 1970. ,
Scaling multifrontal methods for the solution of large sparse linear systems on hybrid shared-distributed memory architectures, 2014. ,
Parallel triangular solution in the out-of-core multifrontal approach for solving large sparse linear systems, 2009. ,
Communication-optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms, Proceedings of the 17th International Conference on Parallel Processing-Volume Part II. Euro-Par'11, pp.978-981, 2011. ,
Inversion of seismic reflection data in the acoustic approximation, Geophysics 49, vol.8, pp.1259-1266, 1984. ,
Regularization of incorrectly posed problems, Soviet Math. Dokl, vol.4, pp.1624-1627, 1963. ,
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, pp.260-274, 2002. ,
, Numerical Linear Algebra. SIAM, p.898713617, 1997.
Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems, SIAM Journal on scientific and Statistical Computing, vol.13, pp.631-644, 1992. ,
An overview of full waveform inversion in exploration geophysics, Geophysics 74, vol.6, pp.1-26, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00457989
A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, ACM Transactions on Mathematical Software, vol.42, issue.3, 2016. ,
A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure, In: Submitted to ACM Trans. Math. Softw, 2013. ,
DOI : 10.1145/2830569
Improving multifrontal solvers by means of algebraic Block LowRank representations, 2013. ,
URL : https://hal.archives-ouvertes.fr/tel-00934939
Rounding Errors in Algebraic Processes, 1963. ,
Roofline: An Insightful Visual Performance Model for Multicore Architectures, Commun. ACM, vol.52, pp.65-76, 2009. ,
DOI : 10.2172/1407078
URL : http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-134.pdf
Hierarchical DAG scheduling for Hybrid Distributed Systems, 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp.156-165, 2015. ,
DOI : 10.1109/ipdps.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359
, To appear or submitted
Efficient Structured Multifrontal Factorization for General Large Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, 2013. ,
DOI : 10.1137/120867032
Superfast Multifrontal Method for Large Structured Linear Systems of Equations, SIAM Journal on Matrix Analysis and Applications, vol.31, pp.1382-1411, 2009. ,
Computing the Minimum Fill, NP-Complete". In: SIAM Journal on Algebraic Discrete Methods, vol.2, issue.1, pp.77-79, 1981. ,
QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011. ,
, GHz and are equipped with Intel AVX SIMD units; the peak performance is of
, Gflop/s per core and thus 691.2 Gflop/s per node for real
Each node is equipped with two Haswell Intel Xeon E5-2680 (twelve cores) processors and 124 GB of memory per node. The cores are clocked at 2.5 GHz and are equipped with Intel AVX SIMD units. In addition, each node is accelerated with four Nvidia K40M GPUs; the peak performance is of 40, This is a five nodes cluster part of the PlaFRIM center ,
, ? brunch: a shared-memory machine installed at the LIP laboratory of ENS-Lyon equipped with 1.5 TB of memory and four Intel 24-cores Broadwell E7-8890v4 processors running at a frequency varying between 2
, Each of its 612 nodes is equipped with 64 GB of memory and two Intel 10-cores Ivy Bridge processors running at 2.8 GHz. The nodes are interconnected with an Infiniband FDR network, ? eos: the supercomputer of the Calcul en Midi-Pyrénées (CALMIP) center (grant P0989, 2008.
, Each of its 102 nodes is equipped with 64 GB of memory and two Intel 10-cores Ivy Bridge processors running at 2.5 GHz. The nodes are interconnected with Infiniband FDR, ? licallo: the supercomputer of the SIGAMM mesocenter in Observatoire de la Côte
, ? farad: a shared-memory machine equipped with 264 GB of memory and two Intel 16-cores Sandy Bridge processors running at 2.9 GHz