Automatically tuned linear algebra software, Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing, pp.1-27, 1998. ,
High-performance implementation of the level-3 BLAS, ACM Trans. Math. Softw, vol.35, issue.1, pp.1-14, 2008. ,
Computing petaflops over terabytes of data: the case of genome-wide association studies, ACM Trans. Math. Softw, vol.40, issue.4, p.22, 2014. ,
Exascale computing study: technology challenges in achieving exascale systems, DARPA report, 2008. ,
Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs, NVIDIA, 2011. ,
Differences in floating-point arithmetic between Intel Xeon processors and the Intel Xeon Phi TM coprocessor, Intel, 2013. ,
Best known method: Avoid heterogeneous precision in control flow calculations, Intel, 2013. ,
The exact dot product as basic tool for long interval arithmetic, Computing, vol.91, issue.3, pp.307-313, 2011. ,
Fast reproducible floating-point summation, Proceedings of the 21st IEEE Symposium on Computer Arithmetic, pp.163-172, 2013. ,
Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi-and Many-Core Architectures, 2014. ,
, IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Standard, pp.754-2008, 2008.
Accuracy and stability of numerical algorithms, Society for Industrial and Applied Mathematics, 2002. ,
Handbook of Floating-Point Arithmetic, 2010. ,
URL : https://hal.archives-ouvertes.fr/ensl-00379167
Design, implementation and testing of extended and mixed precision BLAS, ACM Trans. Math. Softw, vol.28, issue.2, pp.152-205, 2002. ,
Algorithms for quad-double precision floating point arithmetic, Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pp.155-162, 2001. ,
The Art of Computer Programming, Seminumerical Algorithms, vol.2, 1997. ,
Multi-level optimization of matrix multiplication for gpu-equipped systems, In: ICCS. Procedia Computer Science, vol.4, pp.342-351, 2011. ,