N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2002.
DOI : 10.1137/1.9780898718027

C. Bischof and C. Van-loan, The WY Representation for Products of Householder Matrices, SIAM Journal on Scientific and Statistical Computing, vol.8, issue.1, pp.2-13, 1987.
DOI : 10.1137/0908009

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for high performance computing, 2010.

B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010.
DOI : 10.1109/IPDPS.2010.5470443

URL : https://hal.archives-ouvertes.fr/inria-00548899

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Agullo, B. Hadri, H. Ltaief, and J. Dongarra, Comparative study of one-sided factorizations with multiple software packages on multi-core hardware, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, 2009.
DOI : 10.1145/1654059.1654080

B. C. Gunter and R. A. Van-de-geijn, Parallel out-of-core computation and updating of the QR factorization, ACM Transactions on Mathematical Software, vol.31, issue.1, pp.60-78, 2005.
DOI : 10.1145/1055531.1055534

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurrency and Computation: Practice and Experience, pp.1573-1590, 2008.

G. Quintana-ortí, E. S. Quintana-ortí, E. Chan, F. G. Zee, and R. A. Van-de-geijn, Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), 2008.
DOI : 10.1109/PDP.2008.37

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, 2008.
DOI : 10.1137/080731992

URL : https://hal.archives-ouvertes.fr/hal-00870930

J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, and E. J. Kelmelis, CULA: hybrid GPU accelerated linear algebra routines, Modeling and Simulation for Defense Systems and Applications V, 2010.
DOI : 10.1117/12.850538

M. Anderson and J. Demmel, Communication-avoiding QR decomposition for GPU, GPU Technology Conference, Research Poster A01, 2010.

M. Fogué, F. D. Igual, E. S. Quintana-ortí, and R. V. Geijn, Retargeting plapack to clusters with hardware accelerators flame working note #42, 2010.

G. Bosilca, A. Bouteiller, T. Herault, P. Lemarinier, N. Saengpatsa et al., A unified HPC environment for hybrid manycore/GPU distributed systems, LAPACK Working Note, 2010.

]. E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, Proceedings of the 15th International Euro-Par Conference on Parallel Processing, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

G. F. Diamos and S. Yalamanchili, Harmony, Proceedings of the 17th international symposium on High performance distributed computing, HPDC '08, pp.197-200, 2008.
DOI : 10.1145/1383422.1383447

K. Fatahalian, T. Knight, M. Houston, M. Erez, D. Horn et al., Sequoia: Programming the Memory Hierarchy, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.55

P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kalé, and T. R. Quinn, Scaling Hierarchical N-body Simulations on GPU Clusters, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.49

S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
DOI : 10.1109/IPDPSW.2010.5470941

S. Tomov, R. Nath, and J. Dongarra, Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing, Parallel Computing, vol.36, issue.12, 2010.
DOI : 10.1016/j.parco.2010.06.001

R. Nath, S. Tomov, and J. Dongarra, An Improved MAGMA GEMM for Fermi GPUs, 2010.

R. C. Whaley, A. Petitet, and J. Dongarra, Automated empirical optimizations of software and the ATLAS project, Parallel Computing, vol.27, issue.1-2, pp.3-35, 2001.
DOI : 10.1016/S0167-8191(00)00087-9

R. Vuduc, J. Demmel, and K. Yelick, OSKI: A library of automatically tuned sparse matrix kernels, Proc. of SciDAC'05, ser. Journal of Physics: Conference Series, 2005.
DOI : 10.1088/1742-6596/16/1/071

J. Kurzak and J. J. Dongarra, QR factorization for the CELL processor Scientific Programming, Special Issue: High Performance Computing with the, Cell Broadband Engine, vol.17, issue.12, pp.31-42, 2009.

J. Kurzak, H. Ltaief, J. J. Dongarra, and R. M. Badia, Scheduling dense linear algebra operations on multicore processors, Concurrency and Computation: Practice and Experience, vol.35, issue.2, pp.15-44, 2009.
DOI : 10.1145/1377612.1377615

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

H. Topcuoglu, S. Hariri, and M. Wu, Performanceeffective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.129

URL : https://hal.archives-ouvertes.fr/inria-00523937