E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, issue.2, pp.473-484, 2011.
DOI : 10.1016/B978-0-12-385963-1.00034-4

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, 2009.
DOI : 10.1088/1742-6596/180/1/012037

E. Agullo, O. Beaumont, L. Eyraud-dubois, and S. Kumar, Are Static Schedules so Bad? A Case Study on Cholesky Factorization, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016.
DOI : 10.1109/IPDPS.2016.90
URL : https://hal.archives-ouvertes.fr/hal-01223573

C. Augonnet, S. Thibault, R. Namyst, and P. A. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, vol.23, issue.4, pp.187-198, 2011.
DOI : 10.1002/cpe.1631
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. Badia, F. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012.
DOI : 10.1016/j.parco.2011.10.003

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Luszczek et al., Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach, Scalable Computing and Communications: Theory and Practice, 2013.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889

E. Hermann, B. Raffin, F. Faure, T. Gautier, and J. Allard, Multi-gpu and multicpu parallelization for interactive physics simulations, Euro-Par 2010 -Parallel Processing, pp.235-246, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00502448

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing multiple starpu applications over heterogeneous machines: A supervised approach, IJHPCA, vol.28, issue.3, pp.285-300, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00824514

K. Kim, V. Eijkhout, and R. A. Van-de-geijn, Dense matrix computation on a heterogenous architecture: A block synchronous approach, 2012.

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011.
DOI : 10.1155/2011/525717

H. Pan, B. Hindman, and K. Asanovi´casanovi´c, Composing parallel software efficiently with lithe. SIGPLAN Not, pp.376-387, 2010.

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Van-de-geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

F. Song, S. Tomov, and J. Dongarra, Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, pp.365-376, 2012.
DOI : 10.1145/2304576.2304625

H. Topcuoglu, S. Hariri, and M. Y. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002.

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359