C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro- Par, 2009.

J. Bueno, X. Martorell, R. M. Badia, E. Ayguad, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013.
DOI : 10.1145/2464996.2465017

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Locality-aware work stealing on multi-cpu and multi-gpu architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

C. Augonnet, O. Aumage, N. Furmento, S. Thibault, and R. Namyst, StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators INRIA, Rapport de recherche RR-8538, 2014.

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Harnessing Supercomputers with a Sequential Task-based Runtime System, INRIA, Tech. Rep

S. Sauter and C. Schwab, Boundary element methods, " in Boundary Element Methods, ser, 2011.

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distributed dense numerical linear algebra algorithms on massively parallel architectures, 2010.

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, J. Dongarra et al., Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, 2012.
DOI : 10.1109/IPDPS.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01078359

T. Beri, S. Bansal, and S. Kumar, A Scheduling and Runtime Framework for a Cluster of Heterogeneous Machines with Multiple Accelerators, 2015 IEEE International Parallel and Distributed Processing Symposium, 2015.
DOI : 10.1109/IPDPS.2015.12

C. Mei, G. Zheng, F. Gioachin, and L. V. , Optimizing a parallel runtime system for multicore clusters, Proceedings of the 2010 TeraGrid Conference on, TG '10, 2010.
DOI : 10.1145/1838574.1838586

L. Eyraud-dubois, L. Marchal, O. Sinnen, and F. Vivien, Parallel Scheduling of Task Trees with Limited Memory, ACM Transactions on Parallel Computing, vol.2, issue.2, 2015.
DOI : 10.1145/2779052

URL : https://hal.archives-ouvertes.fr/hal-01160118

I. Dooley, C. Mei, J. Lifflander, and L. V. Kale, A study of memory-aware scheduling in message driven prallel programs, International Conference on High Performance Computing, 2010.

D. Sb??rleasb??rlea, Z. Budimli´cbudimli´c, and V. Sarkar, Bounded memory scheduling of dynamic task graphs, International Conference on Prallel Architectures and Compilation, 2014.

M. Tillenius, E. Larsson, R. M. Badia, and X. Martorell, Resource-Aware Task Scheduling, Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2013.
DOI : 10.1145/2638554

P. Arras, D. Fuin, E. Jeannot, A. Stoutchinin, and S. Thibault, List scheduling in embedded systems under memory constraints, International Symposium on Computer Architecture and High Performance Computing, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00906117

K. Desnos, M. Pelcat, J. Nezan, and S. Aridhi, Memory Analysis and Optimized Allocation of Dataflow Applications on Shared-Memory MPSoCs, Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2014.
DOI : 10.1007/s11265-014-0952-6

URL : https://hal.archives-ouvertes.fr/hal-01083576

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2, 2014.
DOI : 10.1145/2898348

URL : https://hal.archives-ouvertes.fr/hal-01333645

E. Solomonik and J. Demmel, Communication-optimal parallel 2.5d matrix multiplicatoin and lu factorization algorithms, International conference on Parallel processing Euro-Par, 2011.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Euro-Par 2013 Parallel Processing, 2013.
DOI : 10.1007/978-3-642-40047-6_53

URL : https://hal.archives-ouvertes.fr/hal-01220611

M. Bebendorf, Approximation of boundary element matrices, Numerische Mathematik, vol.86, issue.4, 2000.
DOI : 10.1007/PL00005410

J. Dongarra, R. Van-de-geijn, and D. Walker, A look at scalable dense linear algebra libraries, Scalable High Performance Computing Conference, 1992.