M. Abduljabbar, M. A. Farhan, R. Yokota, and D. Keyes, Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture, Euro-Par 2017: Parallel Processing, pp.553-564, 2017.

. Amd and . Amd, APP SDK OpenCL Optimization Guide, 2015.

A. Amd and . Sdk-opencl-user-guide, , 2015.

J. E. Barnes and P. Hut, A hierarchical O(N log N) force-calculation algorithm, Nature, vol.324, issue.4, pp.446-449, 1986.

J. Bédorf, E. Gaburov, and S. P. Zwart, A sparse octree gravitational N -body code that runs entirely on the GPU processor, J. Comp. Phys, vol.231, issue.7, pp.2825-2839, 2012.

J. Bédorf, E. Gaburov, M. S. Fujii, and K. Nitadori, Tomoaki Ishiyama, and Simon Portegies Zwart. 24.77 Pflops on a Gravitational Tree-code to Simulate the Milky Way Galaxy with 18600 GPUs, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pp.54-65, 2014.

M. Burtscher and K. Pingali, An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm, pp.75-92, 2011.

H. Cheng, L. Greengard, and V. Rokhlin, A Fast Adaptive Multipole Algorithm in Three Dimensions, Journal of Computational Physics, vol.155, pp.468-498, 1999.

J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, vol.64, p.71, 2014.

J. P. Coles and M. Masella, The fast multipole method and point dipole moment polarizable force fields, The Journal of Chemical Physics, vol.142, issue.2, p.24109, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01449468

W. Dehnen, A Hierarchical O(N) Force Calculation Algorithm, J. Comp. Phys, vol.179, pp.27-42, 2002.

W. Dehnen, A fast multipole method for stellar dynamics, Computational Astrophysics and Cosmology, vol.1, issue.1, 2014.

P. Fortin, E. Athanassoula, and J. Lambert, Comparisons of different codes for galactic N-body simulations, Astronomy & Astrophysics, vol.531, p.120, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01146505

N. A. Gumerov and R. Duraiswami, Fast multipole methods on graphics processors, Journal of Computational Physics, vol.227, pp.8290-8313, 2008.

T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori et al., 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC'09, vol.62, p.12, 2009.

Q. Hu, A. Nail, R. Gumerov, and . Duraiswami, Scalable fast multipole methods on distributed heterogeneous architectures, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, vol.36, p.12, 2011.

H. Ibeid, R. Yokota, J. Pestana, and D. Keyes, Fast multipole preconditioners for sparse matrices arising from elliptic equations. Computing and Visualization in Science, 2017.

. Intel, Version 1.1. Intel. Developer Guide for Intel SDK for OpenCL Applications, 2015.

B. Lange and P. Fortin, Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations, Euro-Par 2014 Parallel Processing, vol.8632, pp.716-727, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00947130

I. Lashuk, A. Chandramowlishwaran, H. Langston, T. Nguyen, R. Sampath et al., A massively parallel adaptive fast-multipole method on heterogeneous architectures, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, vol.58, p.12, 2009.

K. Lorenzen, M. Schwörer, P. Tröster, S. Mates, and P. Tavan, Optimizing the accuracy and efficiency of fast hierarchical multipole expansions for md simulations, Journal of Chemical Theory and Computation, vol.8, issue.10, pp.3628-3636, 2012.

L. Nyland, M. Harris, and J. Prins, Fast N-Body Simulation with CUDA. GPU gems, vol.3, pp.677-695, 2007.

R. E. Overman, J. F. Prins, L. A. Miller, and M. L. Minion, Dynamic Load Balancing of the Adaptive Fast Multipole Method in Heterogeneous Systems, IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp.1126-1135, 2013.

A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra et al., Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pp.1-11, 2010.

I. Said, P. Fortin, J. Lamotte, and H. Calandra, Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs, The International Journal of High Performance Computing Applications, vol.32, issue.6, pp.819-837, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01656841

V. Springel, The cosmological simulation code GADGET-2, Monthly Notices of the Royal Astronomical Society, vol.364, issue.4, pp.1105-1134, 2005.

K. Taura, J. Nakashima, R. Yokota, and N. Maruyama, A Task Parallel Implementation of Fast Multipole Methods, SC Companion, pp.617-625, 2012.

S. Teng, Provably good partitioning and load balancing algorithms for parallel adaptive N-body simulation, SIAM Journal on Scientific Computing, vol.19, issue.2, pp.635-656, 1998.

S. Michael, J. Warren, and . Salmon, A portable parallel particle program, Computer Physics Communications, vol.87, issue.1, pp.266-290, 1995.

R. Yokota, An FMM Based on Dual Tree Traversal for Many-Core Architectures, Journal of Algorithms & Computational Technology, vol.7, issue.3, pp.301-324, 2013.

R. Yokota and L. A. Barba, Chapter 9 -treecode and fast multipole method for n-body simulation with CUDA, GPU Computing Gems Emerald Edition, pp.113-132, 2011.