Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture, Euro-Par 2017: Parallel Processing, pp.553-564, 2017. ,
APP SDK OpenCL Optimization Guide, 2015. ,
, , 2015.
A hierarchical O(N log N) force-calculation algorithm, Nature, vol.324, issue.4, pp.446-449, 1986. ,
A sparse octree gravitational N -body code that runs entirely on the GPU processor, J. Comp. Phys, vol.231, issue.7, pp.2825-2839, 2012. ,
Tomoaki Ishiyama, and Simon Portegies Zwart. 24.77 Pflops on a Gravitational Tree-code to Simulate the Milky Way Galaxy with 18600 GPUs, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '14, pp.54-65, 2014. ,
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm, pp.75-92, 2011. ,
A Fast Adaptive Multipole Algorithm in Three Dimensions, Journal of Computational Physics, vol.155, pp.468-498, 1999. ,
A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, vol.64, p.71, 2014. ,
The fast multipole method and point dipole moment polarizable force fields, The Journal of Chemical Physics, vol.142, issue.2, p.24109, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01449468
A Hierarchical O(N) Force Calculation Algorithm, J. Comp. Phys, vol.179, pp.27-42, 2002. ,
A fast multipole method for stellar dynamics, Computational Astrophysics and Cosmology, vol.1, issue.1, 2014. ,
Comparisons of different codes for galactic N-body simulations, Astronomy & Astrophysics, vol.531, p.120, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-01146505
Fast multipole methods on graphics processors, Journal of Computational Physics, vol.227, pp.8290-8313, 2008. ,
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC'09, vol.62, p.12, 2009. ,
Scalable fast multipole methods on distributed heterogeneous architectures, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, vol.36, p.12, 2011. ,
Fast multipole preconditioners for sparse matrices arising from elliptic equations. Computing and Visualization in Science, 2017. ,
Version 1.1. Intel. Developer Guide for Intel SDK for OpenCL Applications, 2015. ,
Parallel dual tree traversal on multi-core and many-core architectures for astrophysical N-body simulations, Euro-Par 2014 Parallel Processing, vol.8632, pp.716-727, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00947130
A massively parallel adaptive fast-multipole method on heterogeneous architectures, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, vol.58, p.12, 2009. ,
Optimizing the accuracy and efficiency of fast hierarchical multipole expansions for md simulations, Journal of Chemical Theory and Computation, vol.8, issue.10, pp.3628-3636, 2012. ,
Fast N-Body Simulation with CUDA. GPU gems, vol.3, pp.677-695, 2007. ,
Dynamic Load Balancing of the Adaptive Fast Multipole Method in Heterogeneous Systems, IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp.1126-1135, 2013. ,
Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pp.1-11, 2010. ,
Leveraging the accelerated processing units for seismic imaging: A performance and power efficiency comparison against CPUs and GPUs, The International Journal of High Performance Computing Applications, vol.32, issue.6, pp.819-837, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01656841
The cosmological simulation code GADGET-2, Monthly Notices of the Royal Astronomical Society, vol.364, issue.4, pp.1105-1134, 2005. ,
A Task Parallel Implementation of Fast Multipole Methods, SC Companion, pp.617-625, 2012. ,
Provably good partitioning and load balancing algorithms for parallel adaptive N-body simulation, SIAM Journal on Scientific Computing, vol.19, issue.2, pp.635-656, 1998. ,
A portable parallel particle program, Computer Physics Communications, vol.87, issue.1, pp.266-290, 1995. ,
An FMM Based on Dual Tree Traversal for Many-Core Architectures, Journal of Algorithms & Computational Technology, vol.7, issue.3, pp.301-324, 2013. ,
Chapter 9 -treecode and fast multipole method for n-body simulation with CUDA, GPU Computing Gems Emerald Edition, pp.113-132, 2011. ,