L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

F. Sullivan and J. Dongarra, Guest editors' introduction: The top 10 algorithms, Computing in Science & Engineering, vol.2, issue.1, pp.22-23, 2000.

T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori et al., 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence, Proc. of the Conference on High Performance Computing Networking, Storage and Analysis, ser. SC '09, 2009.

A. Rahimian, I. Lashuk, S. K. Veerapaneni, A. Chandramowlishwaran, D. Malhotra et al., Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.
DOI : 10.1109/SC.2010.42

J. Milthorpe, A. P. Rendell, and T. Huber, PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language, Concurrency and Computation: Practice and Experience, 2013.
DOI : 10.1002/cpe.3039

M. Abduljabbar, R. Yokota, and D. Keyes, Asynchronous Execution of the Fast Multipole Method Using Charm++, 2014.

H. Ltaief and R. Yokota, Data-driven execution of fast multipole methods Concurrency and Computation: Practice and Experience, 1935.

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

E. Agullo, O. Aumage, B. Bramas, O. Coulaud, and S. Pitoiset, Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method Available: https, Inria, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01372022

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, pp.2608-2629, 2016.
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-01359458

E. Agullo, O. Aumage, M. Faverge, N. Furmento, F. Pruvost et al., Achieving High Performance on Supercomputers with a Sequential Taskbased Programming Model Available: https, 2016.

L. Greengard and V. Rokhlin, A new version of the Fast Multipole Method for the Laplace equation in three dimensions, Acta Numerica, vol.448, pp.229-269, 1997.
DOI : 10.1016/0009-2614(92)90053-P

M. S. Warren and J. K. Salmon, Astrophysical N-body simulations using hierarchical tree data structures, Proceedings Supercomputing '92, pp.570-576, 1992.
DOI : 10.1109/SUPERC.1992.236647
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.1580

S. Ogata, T. J. Campbell, R. K. Kalia, A. Nakano, P. Vashishta et al., Scalable and portable implementation of the fast multipole method on parallel computers, Computer Physics Communications, vol.153, issue.3, pp.445-461, 2003.
DOI : 10.1016/S0010-4655(03)00246-7

J. Kurzak and B. M. Pettitt, Massively parallel implementation of a fast multipole method for distributed memory machines, Journal of Parallel and Distributed Computing, vol.65, issue.7, pp.870-881, 2005.
DOI : 10.1016/j.jpdc.2005.02.001

F. A. Cruz, M. G. Knepley, and L. A. Barba, PetFMM-A dynamically load-balancing parallel fast multipole library, International Journal for Numerical Methods in Engineering, vol.19, issue.2, pp.403-428, 2011.
DOI : 10.1002/nme.2972
URL : http://arxiv.org/abs/0905.2637

O. Coulaud, P. Fortin, and J. Roman, Hybrid MPI-Thread Parallelization of the Fast Multipole Method, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07), p.52, 2007.
DOI : 10.1109/ISPDC.2007.29
URL : https://hal.archives-ouvertes.fr/inria-00131001

A. Chandramowlishwaran, S. Williams, L. Oliker, I. Lashuk, G. Biros et al., Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-12, 2010.
DOI : 10.1109/IPDPS.2010.5470415

D. Malhotra, A. Gholami, and G. Biros, A Volume Integral Equation Stokes Solver for Problems with Variable Coefficients, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.92-102, 2014.
DOI : 10.1109/SC.2014.13

T. Ishiyama, K. Nitadori, and J. Makino, 4.45 pflops astrophysical n-body simulation on k computer: The gravitational trillion-body problem Networking, Storage and Analysis, ser A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method, Proc. of the International Conference on High Performance Computing GPGPU- 7, Workshop on General Purpose Processing Using GPUs, pp.1-5, 2012.

B. Bramas, Optimization and parallelization of the boundary element method for the wave equation in time domain
URL : https://hal.archives-ouvertes.fr/tel-01306571

M. S. Warren and J. K. Salmon, A parallel hashed Oct-Tree N-body algorithm, Proceedings of the 1993 ACM/IEEE conference on Supercomputing , Supercomputing '93, pp.12-21, 1993.
DOI : 10.1145/169627.169640
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.5842

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence- Based Approach, 2002.

J. Yu and R. Buyya, A taxonomy of scientific workflow systems for grid computing, ACM SIGMOD Record, vol.34, issue.3, pp.44-49, 2005.
DOI : 10.1145/1084805.1084814

M. Cosnard and M. Loi, Automatic task graph generation techniques, System Sciences Proc. of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.
DOI : 10.1109/hicss.1995.375471

Z. Budimli?, M. Burke, V. Cavé, K. Knobe, G. Lowney et al., Concurrent Collections, Scientific Programming, vol.18, issue.3-4, pp.3-4, 2010.
DOI : 10.1155/2010/521797

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, A. Haidar et al., Distibuted Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA Available: https, Proc. of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011.
DOI : 10.1109/ipdps.2011.299
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.4744

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, 2009.
DOI : 10.1007/978-3-642-03869-3_80
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.5547

A. Duran, J. M. Perez, R. M. Ayguadé, E. Badia, and J. Labarta, Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism, 4th International Workshop, pp.111-122, 2008.
DOI : 10.1007/978-3-540-79561-2_10
URL : http://hdl.handle.net/2117/28390

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK users' guide: QUeueing And Runtime for Kernels, 2011.

E. Tejedor, M. Farreras, D. Grove, R. M. Badia, G. Almasi et al., A highproductivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, 2012.

A. Yarkhan, Dynamic task execution on shared and distributed memory architectures, 2012.

C. Augonnet, J. Clet-ortega, S. Thibault, and R. Namyst, Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, 2010.
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937

W. Fong and E. Darve, The black-box fast multipole method, Journal of Computational Physics, vol.228, issue.23, pp.8712-8725, 2009.
DOI : 10.1016/j.jcp.2009.08.031