DAGuE: A generic distributed DAG engine for High Performance Computing, Parallel Computing, vol.38, issue.1-2, pp.37-51, 2012. ,
DOI : 10.1016/j.parco.2011.10.003
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, p.12037, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Retargeting PLAPACK to clusters with hardware accelerators, 2010 International Conference on High Performance Computing & Simulation, pp.444-451, 2010. ,
DOI : 10.1109/HPCS.2010.5547094
Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, vol.44, issue.4, pp.121-130, 2009. ,
DOI : 10.1145/1594835.1504196
Enabling and scaling matrix computations on heterogeneous multi-core and multigpu systems, Proceedings of the 26th ACM International Conference on Supercomputing, ser. ICS '12, pp.365-376, 2012. ,
DOI : 10.1145/2304576.2304625
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.259.5355
Dense matrix computation on a heterogenous architecture: A block synchronous approach, FLAME Working Note, 2012. ,
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor, 2013 25th International Symposium on Computer Architecture and High Performance Computing, pp.105-112, 2013. ,
DOI : 10.1109/SBAC-PAD.2013.28
URL : https://hal.archives-ouvertes.fr/hal-00878325
Performance Portability of a GPU Enabled Factorization with the DAGuE Framework, 2011 IEEE International Conference on Cluster Computing, pp.395-402, 2011. ,
DOI : 10.1109/CLUSTER.2011.51
Satisfying your dependencies with SuperMatrix, 2007 IEEE International Conference on Cluster Computing, pp.91-99, 2007. ,
DOI : 10.1109/CLUSTR.2007.4629221
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
Reconstructing Householder Vectors from Tall-Skinny QR, International Parallel & Distributed Processing Symposium, 2014. ,
DOI : 10.1109/ipdps.2014.120
URL : https://hal.archives-ouvertes.fr/hal-01241785
Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) ,
DOI : 10.1109/IPDPSW.2010.5470941
Autotuning Method for Deciding Block Size Parameters in Dynamically Load-Balanced BLAS, pp.33-48, 2010. ,
DOI : 10.1007/978-1-4419-6935-4_3