A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007. ,
DOI : 10.1145/1248377.1248397
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Fortran subroutines for out-of-core solutions of large complex linear systems, 1979. ,
CULA: hybrid GPU accelerated linear algebra routines, Modeling and Simulation for Defense Systems and Applications V, 2010. ,
DOI : 10.1117/12.850538
Retargeting plapack to clusters with hardware accelerators flame working note #42, 2010. ,
A unified HPC environment for hybrid manycore/GPU distributed systems, LAPACK Working Note, 2010. ,
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008. ,
DOI : 10.1145/1383422.1383447
Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006. ,
DOI : 10.1109/SC.2006.55
Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010. ,
DOI : 10.1109/SC.2010.49
URL : http://charm.cs.illinois.edu/newPapers/10-16/paper.pdf
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009. ,
DOI : 10.1145/1654059.1654080
QR factorization for the CELL processor Scientific Programming, Special Issue: High Performance Computing with the, Cell Broadband Engine, vol.17, issue.12, pp.31-42, 2009. ,
Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009. ,
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294
LU, QR and Cholesky factorizations using vector capabilities of GPUs, 2008. ,
StarPU: a Runtime System for Scheduling Tasks over Accelerator- Based Multicore Machines, INRIA, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00467677
Average-Case Stability of Gaussian Elimination, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.3, pp.335-360, 1990. ,
DOI : 10.1137/0611023
Updating an LU Factorization with Pivoting, ACM Transactions on Mathematical Software, vol.35, issue.2, pp.1-16, 2008. ,
DOI : 10.1145/1377612.1377615