A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Scalable Dense Linear Algebra on Heterogeneous Hardware HPC: Transition Towards Exascale Processing, the series Advances in Parallel Computing, pp.65-103, 2013. ,
FLAME: Formal Linear Algebra Methods Environment, ACM Transactions on Mathematical Software, vol.27, issue.4, pp.422-455, 2001. ,
DOI : 10.1145/504210.504213
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.118.7096
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, Jade Edition, pp.473-484, 2011. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp.29-38, 2014. ,
DOI : 10.1109/IPDPSW.2014.9
URL : https://hal.archives-ouvertes.fr/hal-00925017
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions On Mathematical Software. [Online]. Available, 2014. ,
DOI : 10.1109/71.993206
URL : https://hal.archives-ouvertes.fr/hal-01333645
Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures, Inria, Tech. Rep, vol.44, issue.4, 2016. ,
DOI : 10.1137/1.9780898718003
URL : https://hal.archives-ouvertes.fr/hal-01334734
Data-driven execution of fast multipole methods, Concurrency and Computation: Practice and Experience, vol.26, issue.11, 1203. ,
DOI : 10.1002/cpe.3132
Task-Based FMM for Multicore Architectures, SIAM Journal on Scientific Computing, vol.36, issue.1, pp.66-93, 2014. ,
DOI : 10.1137/130915662
URL : https://hal.archives-ouvertes.fr/hal-00807368
Résolution directe rapide pour les eléments finis de frontiere en electromagnétisme et acoustique: H-matrices. Parallélisme et applications industrielles, 2014. ,
Application of the ParalleX execution model to stencil-based problems, 2014 IEEE International Conference on High Performance Computing and Communications (HPCC), pp.253-261, 2013. ,
DOI : 10.1016/j.enconman.2010.02.024
A high-productivity task-based programming model for clusters, Concurrency and Computation: Practice and Experience, pp.2421-2448, 2012. ,
DOI : 10.1145/2020373.2020377
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.2467
Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.359-368, 2013. ,
DOI : 10.1145/2464996.2465017
Quark users' guide: Queueing and runtime for kernels, 2011. ,
Programming Models Based on Data Versioning for Dependency-aware Task-based Parallelisation, 2012 IEEE 15th International Conference on Computational Science and Engineering, pp.275-280, 2012. ,
DOI : 10.1109/ICCSE.2012.45
Legion: Expressing locality and independence with logical regions, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, pp.661-6611, 2012. ,
DOI : 10.1109/SC.2012.71
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.7715
LAPACK: a portable linear algebra library for highperformance computers, The 1990 ACM/IEEE conference on Supercomputing, pp.2-11, 1990. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, pp.187-198, 2011. ,
DOI : 10.1007/978-3-642-03869-3_80
URL : https://hal.archives-ouvertes.fr/inria-00384363
Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.291-298, 2010. ,
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937
Automatic task graph generation techniques, System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995. ,
Distributed Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel & Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011, pp.1432-1441, 2011. ,
DOI : 10.1109/ipdps.2011.299
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.228.4744
Analysis of Programs for Parallel Processing, IEEE Transactions on Electronic Computers, vol.15, issue.5, pp.757-763, 1966. ,
DOI : 10.1109/PGEC.1966.264565
Dynamic Task Execution on Shared and Distributed Memory Architectures, 2012. ,
DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.1874
A task parallel implementation of a scattered node stencil-based solver for the shallow water equations, Swedish Workshop on Multi-Core Computing, 2013. ,
Regent, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on, SC '15, pp.811-8112, 2015. ,
DOI : 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H
A scalable framework for heterogeneous GPU-based clusters, Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures, SPAA '12, pp.91-100, 2012. ,
DOI : 10.1145/2312005.2312025
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.466.9268
Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.6-7, 2012. ,
DOI : 10.1016/j.parco.2012.03.005
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009. ,
DOI : 10.1145/1669112.1669121
Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures, Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG), 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00780890
Runtime support for object-based message-driven parallel applications on heterogeneous clusters, 2012. ,
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, 2011. ,
DOI : 10.1155/2011/525717
URL : https://doi.org/10.1155/2011/525717
Extending Unified Parallel C for GPU Computing, SIAM Conference on Parallel Processing for Scientific Computing (SIAMPP), 2010. ,
An Extension of XcalableMP PGAS Language for Multi-node GPU Clusters, HeteroPar, 2011. ,
Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures, Proceedings of the International Euro-Par Workshops, pp.56-65, 2009. ,
DOI : 10.1007/978-3-642-14122-5_9
URL : https://hal.archives-ouvertes.fr/inria-00421333
Performance-effective and low-complexity task scheduling for heterogeneous computing Parallel and Distributed Systems, IEEE Transactions on, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.122
Hierarchical DAG Scheduling for Hybrid Distributed Systems Available: https, 29th IEEE International Parallel & Distributed Processing Symposium, 2015. ,
DOI : 10.1109/ipdps.2015.56
Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5423, 2013. ,
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677
Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016. ,
DOI : 10.1109/IPDPSW.2016.105
URL : https://hal.archives-ouvertes.fr/hal-01380126