A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

E. Chan, E. S. Quintana-ortí, G. G. Quintana-ortí, and R. Van-de-geijn, Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures , SPAA '07, pp.116-125, 2007.
DOI : 10.1145/1248377.1248397

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614

E. L. Yip, Fortran subroutines for out-of-core solutions of large complex linear systems, 1979.

J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, and E. J. Kelmelis, CULA: hybrid GPU accelerated linear algebra routines, Modeling and Simulation for Defense Systems and Applications V, 2010.
DOI : 10.1117/12.850538

M. Fogué, F. D. Igual, E. S. Quintana-ortí, and R. V. Geijn, Retargeting plapack to clusters with hardware accelerators flame working note #42, 2010.

G. Bosilca, A. Bouteiller, T. Herault, P. Lemarinier, N. Saengpatsa et al., A unified HPC environment for hybrid manycore/GPU distributed systems, LAPACK Working Note, 2010.

E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

G. F. Diamos and S. Yalamanchili, Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008.
DOI : 10.1145/1383422.1383447

K. Fatahalian, T. Knight, M. Houston, M. Erez, D. Horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kalé, and T. R. Quinn, Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.49
URL : http://charm.cs.illinois.edu/newPapers/10-16/paper.pdf

E. Agullo, B. Hadri, H. Ltaief, and J. Dongarra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
DOI : 10.1145/1654059.1654080

J. Kurzak and J. J. Dongarra, QR factorization for the CELL processor Scientific Programming, Special Issue: High Performance Computing with the, Cell Broadband Engine, vol.17, issue.12, pp.31-42, 2009.

J. Kurzak, H. Ltaief, J. J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009.
DOI : 10.1145/1377612.1377615
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.177.3294

V. Volkov and J. Demmel, LU, QR and Cholesky factorizations using vector capabilities of GPUs, 2008.

C. Augonnet, S. Thibault, and R. Namyst, StarPU: a Runtime System for Scheduling Tasks over Accelerator- Based Multicore Machines, INRIA, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00467677

L. N. Trefethen and R. S. Schreiber, Average-Case Stability of Gaussian Elimination, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.3, pp.335-360, 1990.
DOI : 10.1137/0611023

E. S. Quintana-ortí and R. A. Van-de-geijn, Updating an LU Factorization with Pivoting, ACM Transactions on Mathematical Software, vol.35, issue.2, pp.1-16, 2008.
DOI : 10.1145/1377612.1377615