The landscape of parallel computing research: A view from Berkeley, EECS, 2006. ,
The implementation of the cilk-5 multithreaded language, Conference on Programming Language Design and Implementation, 1998. ,
The Design of OpenMP Tasks, Transactions on Parallel and Distributed Systems, 2009. ,
DOI : 10.1109/TPDS.2008.105
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par, 2009. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.220.5547
Dague: A generic distributed dag engine for high performance computing, Parallel Computing, 2012. ,
DOI : 10.1109/ipdps.2011.281
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874
Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, 2013. ,
DOI : 10.1145/2464996.2465017
Localityaware work stealing on multi-cpu and multi-gpu architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00780890
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011. ,
DOI : 10.1155/2011/525717
URL : http://doi.org/10.1155/2011/525717
Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, 2015. ,
DOI : 10.1145/2807591.2807629
A rosebased openmp 3.0 research compiler supporting multiple runtime libraries, Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 2010. ,
OpenMP 3.0 tasking implementation in OpenUH, 2009. ,
Mercurium: Design decisions for a s2s compiler, Cetus Users and Compiler Infastructure Workshop, 2011. ,
OpenMP tasks in IBM XL compilers, Proceedings of the 2008 conference of the center for advanced studies on collaborative research meeting of minds, CASCON '08, 2008. ,
DOI : 10.1145/1463788.1463810
URL : http://hdl.handle.net/2117/15709
A fast algorithm for particle simulations, Journal of computational physics, 1987. ,
Task-based FMM for multicore architectures Optimizing and tuning the fast multipole method for state-of-theart multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS, 2010. ,
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems, International Journal of High Performance Computing Applications, vol.26, issue.4, 2012. ,
DOI : 10.1177/1094342011429952
Parallel dual tree traversal on multicore and many-core architectures for astrophysical n-body simulations, 2014. ,
DOI : 10.1007/978-3-319-09873-9_60
URL : https://hal.archives-ouvertes.fr/hal-00947130
Fast hierarchical algorithms for generating Gaussian random fields, Inria Bordeaux Sud-Ouest Research Report, vol.8811, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01228519
Parallel algorithms, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00789466
Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice and Experience, 2016. ,
DOI : 10.1002/cpe.3723
URL : https://hal.archives-ouvertes.fr/hal-01359458
Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, 2013. ,
DOI : 10.1002/cpe.3132
URL : http://arxiv.org/abs/1203.0889
A CPU, Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU-7, 2014. ,
DOI : 10.1145/2588768.2576787