G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, p.12037, 2009.
DOI : 10.1088/1742-6596/180/1/012037

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

M. Fogue, F. D. Igual, E. S. Quintana-orti, and R. A. Van-de-geijn, Retargeting PLAPACK to clusters with hardware accelerators, 2010 International Conference on High Performance Computing & Simulation, pp.444-451, 2010.
DOI : 10.1109/HPCS.2010.5547094

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009.
DOI : 10.1145/1594835.1504196

F. Song, S. Tomov, and J. Dongarra, Enabling and scaling matrix computations on heterogeneous multi-core and multigpu systems, Proceedings of the 26th ACM International Conference on Supercomputing, ser. ICS '12, pp.365-376, 2012.
DOI : 10.1145/2304576.2304625
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.5355

K. Kim, V. Eijkhout, and R. A. Van-de-geijn, Dense matrix computation on a heterogenous architecture: A block synchronous approach, FLAME Working Note, 2012.

J. Lima, F. Broquedis, T. Gautier, and B. Raffin, Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor, 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp.105-112, 2013.
DOI : 10.1109/SBAC-PAD.2013.28
URL : https://hal.archives-ouvertes.fr/hal-00878325

G. Bosilca, A. Bouteiller, T. Herault, P. Lemarinier, N. O. Saengpatsa et al., Performance Portability of a GPU Enabled Factorization with the DAGuE Framework, 2011 IEEE International Conference on Cluster Computing, pp.395-402, 2011.
DOI : 10.1109/CLUSTER.2011.51

E. Chan, F. G. Van-zee, E. S. Quintana-orti, G. Quintana-orti, and R. Van-de-geijn, Satisfying your dependencies with SuperMatrix, 2007 IEEE International Conference on Cluster Computing, pp.91-99, 2007.
DOI : 10.1109/CLUSTR.2007.4629221

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614

G. Ballard, J. Demmel, L. Grigori, E. Solomonik, M. Jacquelin et al., Reconstructing Householder Vectors from Tall-Skinny QR, International Parallel & Distributed Processing Symposium, 2014.
DOI : 10.1109/ipdps.2014.120
URL : https://hal.archives-ouvertes.fr/hal-01241785

S. Tomov, R. Nath, H. Ltaief, and J. J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)
DOI : 10.1109/IPDPSW.2010.5470941

Y. Sawa and R. Suda, Autotuning Method for Deciding Block Size Parameters in Dynamically Load-Balanced BLAS, pp.33-48, 2010.
DOI : 10.1007/978-1-4419-6935-4_3