Fine grain scheduling for sparse solver on manycore architectures, 15th SIAM Conference on Parallel Processing for Scientific Computing, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00769026
The Impact of Multicore on Math Software, 2006. ,
DOI : 10.1007/978-3-540-75755-9_1
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Solving unsymmetric sparse systems of linear equations with PARDISO, Future Generation Computer Systems, vol.20, issue.3, pp.475-487, 2004. ,
DOI : 10.1016/j.future.2003.07.011
Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM Journal on Scientific Computing, vol.32, issue.6, pp.3627-3649, 2010. ,
DOI : 10.1137/090757216
Evaluation of sparse LU factorization and triangular solution on multicore platforms, " in VECPAR, ser. Lecture Notes in Computer Science, pp.287-300, 2008. ,
Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, p.8, 2011. ,
DOI : 10.1145/2049662.2049670
Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM SISC and APO technical report number RT-APO-11-6, 2013. ,
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1, 2012. ,
Dynamic task execution on shared and distributed memory architectures, 2012. ,
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011. ,
DOI : 10.1109/IPDPS.2011.299
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012. ,
DOI : 10.1016/j.jpdc.2011.10.014
Multifrontal Computations on GPUs and Their Multi-core Hosts, Proceedings of the 9th international conference on High performance computing for computational science, ser. VECPAR'10, pp.71-82, 2011. ,
DOI : 10.1016/0167-8191(86)90019-0
A CPU???GPU hybrid approach for the unsymmetric multifrontal method, Parallel Computing, vol.37, issue.12, pp.759-770, 2011. ,
DOI : 10.1016/j.parco.2011.09.002
The Role of Elimination Trees in Sparse Factorization, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.1, pp.134-172, 1990. ,
DOI : 10.1137/0611010
The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983. ,
DOI : 10.1145/356044.356047
Progress in Sparse Matrix Methods for Large Linear Systems On Vector Supercomputers, International Journal of High Performance Computing Applications, vol.1, issue.4, pp.10-30, 1987. ,
DOI : 10.1177/109434208700100403
Compact DAG representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, vol.64, issue.8, pp.921-935, 2004. ,
DOI : 10.1016/j.jpdc.2004.05.001
URL : https://hal.archives-ouvertes.fr/inria-00099958
Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.311-3111, 2008. ,
DOI : 10.1109/SC.2008.5214359
On finding approximate supernodes for an efficient block-ILU(k) factorization, Parallel Computing, vol.34, issue.6-8, pp.345-362, 2008. ,
DOI : 10.1016/j.parco.2007.12.003
The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, 1994. ,
DOI : 10.1145/2049662.2049663
One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*, Procedia Computer Science, vol.9, issue.Complete, pp.37-46, 2012. ,
DOI : 10.1016/j.procs.2012.04.005
Cublas library, NVIDIA Corporation, vol.15, 2008. ,
Fast implementation of dgemm on fermi gpu Storage and Analysis, ser. SC '11, Proceedings of 2011 International Conference for High Performance Computing , Networking, pp.351-3511, 2011. ,
An Improved Magma Gemm For Fermi Graphics Processing Units, The International Journal of High Performance Computing Applications, vol.27, issue.1, pp.511-515, 2010. ,
DOI : 10.1177/1094342010385729
Autotuning GEMM Kernels for the Fermi GPU, IEEE Transactions on Parallel and Distributed Systems, vol.23, issue.11, pp.2045-2057, 2012. ,
DOI : 10.1109/TPDS.2011.311