K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands et al., The landscape of parallel computing research: A view from Berkeley, EECS, 2006.

M. Frigo, C. E. Leiserson, and K. H. Randall, The implementation of the cilk-5 multithreaded language, Conference on Programming Language Design and Implementation, 1998.

E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin et al., The Design of OpenMP Tasks, Transactions on Parallel and Distributed Systems, 2009.
DOI : 10.1109/TPDS.2008.105

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, 2009.
DOI : 10.1007/978-3-642-03869-3_80

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.5547

G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier et al., Dague: A generic distributed dag engine for high performance computing, Parallel Computing, 2012.
DOI : 10.1109/ipdps.2011.281

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874

J. Bueno, X. Martorell, R. M. Badia, E. Ayguadé, and J. Labarta, Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013.
DOI : 10.1145/2464996.2465017

T. Gautier, J. V. Lima, N. Maillard, and B. Raffin, Localityaware work stealing on multi-cpu and multi-gpu architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013.
URL : https://hal.archives-ouvertes.fr/hal-00780890

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011.
DOI : 10.1155/2011/525717

URL : http://doi.org/10.1155/2011/525717

E. Slaughter, W. Lee, S. Treichler, M. Bauer, and A. Aiken, Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015.
DOI : 10.1145/2807591.2807629

C. Liao, D. J. Quinlan, T. Panas, and B. R. De-supinski, A rosebased openmp 3.0 research compiler supporting multiple runtime libraries, Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 2010.

C. Addisson, J. Lagrone, L. Huang, and B. Chapman, OpenMP 3.0 tasking implementation in OpenUH, 2009.

R. Ferrer, S. Royuela, D. Caballero, A. Duran, X. Martorell et al., Mercurium: Design decisions for a s2s compiler, Cetus Users and Compiler Infastructure Workshop, 2011.

X. Teruel, P. Unnikrishnan, X. Martorell, E. Ayguadé, R. Silvera et al., OpenMP tasks in IBM XL compilers, Proceedings of the 2008 conference of the center for advanced studies on collaborative research meeting of minds, CASCON '08, 2008.
DOI : 10.1145/1463788.1463810

URL : http://hdl.handle.net/2117/15709

L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of computational physics, 1987.

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for multicore architectures Optimizing and tuning the fast multipole method for state-of-theart multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS, 2010.

R. Yokota and L. A. Barba, A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems, International Journal of High Performance Computing Applications, vol.26, issue.4, 2012.
DOI : 10.1177/1094342011429952

B. Lange and P. Fortin, Parallel dual tree traversal on multicore and many-core architectures for astrophysical n-body simulations, 2014.
DOI : 10.1007/978-3-319-09873-9_60

URL : https://hal.archives-ouvertes.fr/hal-00947130

P. Blanchard, O. Coulaud, and E. Darve, Fast hierarchical algorithms for generating Gaussian random fields, Inria Bordeaux Sud-Ouest Research Report, vol.8811, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01228519

H. Casanova, A. Legrand, and Y. Robert, Parallel algorithms, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00789466

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, 2016.
DOI : 10.1002/cpe.3723

URL : https://hal.archives-ouvertes.fr/hal-01359458

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, 2013.
DOI : 10.1002/cpe.3132

URL : http://arxiv.org/abs/1203.0889

J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, A CPU, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, 2014.
DOI : 10.1145/2588768.2576787