E. Agullo, O. Aumage, B. Bramas, O. Coulaud, and S. Pitoiset, Bridging the gap between OpenMP 4.0 and native runtime systems for the fast multipole method, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01372022

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-based FMM for heterogeneous architectures. Concurrency and Computation: Practice and Experience, pp.2608-2629, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01359458

E. Agullo, B. Bramas, O. Coulaud, M. Khannouz, and L. Stanisic, Task-based fast multipole method for clusters of multicore processors, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01387482

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014.
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, vol.43, issue.2, 2014.
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00384363

A. Lorena, R. Barba, and . Yokota, How will the fast multipole method fare in the exascale era, SIAM News, vol.46, issue.6, pp.1-3, 2013.

B. Bramas, Optimization and parallelization of the boundary element method for the wave equation in time domain. Theses, 2016.
URL : https://hal.archives-ouvertes.fr/tel-01306571

A. Buttari, Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, pp.323-345, 2013.
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, p.74, 2014.
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319

J. Choi, A. Chandramowlishwaran, K. Madduri, and R. Vuduc, A CPU, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, pp.64-64, 2014.
DOI : 10.1145/2588768.2576787

P. Cicotti, X. S. Li, and S. B. Baden, Performance Modeling Tools for Parallel Sparse Linear Algebra Computations In Parallel Computing: From Multicores and GPU's to Petascale, Proceedings of the conference ParCo, pp.83-90, 2009.

A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell et al., OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES, Parallel Processing Letters, vol.21, issue.02, 2011.
DOI : 10.1142/S0129626411000151

L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, Journal of Computational Physics, vol.73, issue.2, pp.325-348, 1987.
DOI : 10.1016/0021-9991(87)90140-9

W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface. Scientific And Engineering Computation Series, 1999.

B. Haugen, J. Kurzak, A. Yarkhan, P. Luszczek, and J. Dongarra, Parallel Simulation of Superscalar Scheduling, 2014 43rd International Conference on Parallel Processing, pp.121-130, 2014.
DOI : 10.1109/ICPP.2014.21

S. Xiaoye, J. W. Li, and . Demmel, Superlu dist: A scalable distributedmemory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Softw, vol.29, issue.2, pp.110-140, 2003.

J. L. Lo, S. J. Eggers, J. S. Emer, H. M. Levy, R. L. Stamm et al., Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems, vol.15, issue.3, pp.322-354, 1997.
DOI : 10.1145/263326.263382

H. Ltaief and R. Yokota, Data-Driven Execution of Fast Multipole Methods. CoRR, abs, 1203.

T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, Producing Wrong Data Without Doing Anything Obviously Wrong, Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp.265-276, 2009.
DOI : 10.1145/1508244.1508275
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.163.8395

A. Patel, F. Afram, S. Chen, and K. Ghose, MARSS, Proceedings of the 48th Design Automation Conference on, DAC '11, pp.1050-1055, 2011.
DOI : 10.1145/2024724.2024954

R. Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, 2016.

A. Rico, A. Duran, F. Cabarcas, Y. Etsion, A. Ramírez et al., Trace-driven simulation of multithreaded applications, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, pp.87-96, 2011.
DOI : 10.1109/ISPASS.2011.5762718

A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield et al., The structural simulation toolkit, ACM SIGMETRICS Performance Evaluation Review, vol.38, issue.4, pp.37-42, 2011.
DOI : 10.1145/1964218.1964225

A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia et al., A Framework for Performance Modeling and Prediction, ACM/IEEE SC 2002 Conference (SC'02), pp.1-17, 2002.
DOI : 10.1109/SC.2002.10004

L. Stanisic, E. Agullo, A. Buttari, A. Guermouche, A. Legrand et al., Fast and Accurate Simulation of Multithreaded Sparse Linear Algebra Solvers, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), 2015.
DOI : 10.1109/ICPADS.2015.67
URL : https://hal.archives-ouvertes.fr/hal-01180272

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Faithful Performance Prediction of a Dynamic Task- Based Runtime System for Heterogeneous Multi-Core Architectures. Concurrency and Computation: Practice and Experience, p.16, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01147997

F. Sullivan and J. Dongarra, Guest editors' introduction: The top 10 algorithms, Computing in Science & Engineering, vol.2, issue.1, pp.22-23, 2000.

H. Topcuouglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

V. M. Weaver and S. A. Mckee, Are Cycle Accurate Simulations a Waste of Time?, Proc. of the 7th Workshop on Duplicating, Deconstruction and Debunking, 2008.

G. Zheng, G. Kakulapati, and L. Kalé, BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines, Proc. of the 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004.