S. N. Yeralan, T. A. Davis, and S. Ranka, Another possible evolution of our solver is towards distributed memory, parallel systems: modern runtime systems, such as StarPU, are capable of handling this type of architectures by transparently managing the transfer of data between nodes through the network. A solver that implements all the above-mentioned features is our ultimate objective, Sparse QR factorization on the GPU, 2015.

C. D. Yu, W. Wang, and D. Pierce, A CPU???GPU hybrid approach for the unsymmetric multifrontal method, Parallel Computing, vol.37, issue.12, pp.759-770, 2011.
DOI : 10.1016/j.parco.2011.09.002

K. Kim and V. Eijkhout, Scheduling a Parallel Sparse Direct Solver to Multiple GPUs, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1401-1408, 2013.
DOI : 10.1109/IPDPSW.2013.26

J. Hogg, E. Ovtchinnikov, and J. Scott, A Sparse Symmetric Indefinite Direct Solver for GPU Architectures, Tech. Rep. RAL-P, 2014.
DOI : 10.1145/2756548

X. Chen, L. Ren, Y. Wang, and H. Yang, GPU-accelerated sparse LU factorization for circuit simulation with performance modeling Parallel and Distributed Systems, IEEE Transactions on, vol.26, issue.3, pp.786-795, 2015.

P. Sao, R. W. Vuduc, and X. S. Li, A Distributed CPU-GPU Sparse Direct Solver, Euro-Par 2014 Parallel Processing, pp.487-498, 2014.
DOI : 10.1007/978-3-319-09873-9_41

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, p.12037, 2009.
DOI : 10.1088/1742-6596/180/1/012037

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Luszczek et al., Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach, Scalable Computing and Communications: Theory and Practice, 2013.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, pp.187-198, 2009.
DOI : 10.1007/978-3-642-03869-3_80
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.5547

G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience, pp.2438-2456, 2009.
DOI : 10.1002/cpe.1463
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.3457

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations, Euro-Par, pp.2010-235
DOI : 10.1007/978-3-642-15291-7_23
URL : https://hal.archives-ouvertes.fr/inria-00502448

X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca, Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, p.5, 2014.
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017

K. Kim and V. Eijkhout, A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling, ACM Transactions on Mathematical Software, vol.41, issue.1, pp.1-3, 2014.
DOI : 10.1145/2629641

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645

T. A. Davis, Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-822, 2011.
DOI : 10.1145/2049662.2049670

I. S. Duff and J. K. Reid, The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983.
DOI : 10.1145/356044.356047

R. Schreiber, A New Implementation of Sparse Gaussian Elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982.
DOI : 10.1145/356004.356006

P. R. Amestoy, I. S. Duff, and C. Puglisi, Multifrontal QR Factorization in a Multiprocessor Environment, Numerical Linear Algebra with Applications, vol.8, issue.89, pp.275-300, 1996.
DOI : 10.1002/(SICI)1099-1506(199607/08)3:4<275::AID-NLA83>3.0.CO;2-7

A. Buttari, Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, pp.323-345, 2013.
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471

J. Kurzak and J. Dongarra, Fully dynamic scheduler for numerical computing on multicore processors, LAPACK working note, 2009.

H. Topcuouglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359

A. Geist and E. G. Ng, Task scheduling for parallel sparse Cholesky factorization, International Journal of Parallel Programming, vol.27, issue.4, pp.291-314, 1989.
DOI : 10.1007/BF01407861

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, vol.7490, issue.9, 2014.
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-00974674