R. C. Whaley and J. J. Dongarra, Automatically tuned linear algebra software, Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). Supercomputing, pp.1-27, 1998.

K. Goto and R. A. Van-de-geijn, High-performance implementation of the level-3 BLAS, ACM Trans. Math. Softw, vol.35, issue.1, pp.1-14, 2008.

D. Fabregat-traver and P. Bientinesi, Computing petaflops over terabytes of data: the case of genome-wide association studies, ACM Trans. Math. Softw, vol.40, issue.4, p.22, 2014.

K. Bergman, Exascale computing study: technology challenges in achieving exascale systems, DARPA report, 2008.

N. Whitehead and A. Fit-florea, Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs, NVIDIA, 2011.

M. Corden, Differences in floating-point arithmetic between Intel Xeon processors and the Intel Xeon Phi TM coprocessor, Intel, 2013.

K. Doertel, Best known method: Avoid heterogeneous precision in control flow calculations, Intel, 2013.

U. Kulisch and V. Snyder, The exact dot product as basic tool for long interval arithmetic, Computing, vol.91, issue.3, pp.307-313, 2011.

J. Demmel and H. D. Nguyen, Fast reproducible floating-point summation, Proceedings of the 21st IEEE Symposium on Computer Arithmetic, pp.163-172, 2013.

S. Collange, D. Defour, S. Graillat, and R. Iakymchuk, Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi-and Many-Core Architectures, 2014.

, IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. IEEE Standard, pp.754-2008, 2008.

N. J. Higham, Accuracy and stability of numerical algorithms, Society for Industrial and Applied Mathematics, 2002.

J. M. Muller, N. Brisebarre, F. De-dinechin, C. P. Jeannerod, V. Lefèvre et al., Handbook of Floating-Point Arithmetic, 2010.
URL : https://hal.archives-ouvertes.fr/ensl-00379167

X. S. Li, J. W. Demmel, D. H. Bailey, G. Henry, Y. Hida et al., Design, implementation and testing of extended and mixed precision BLAS, ACM Trans. Math. Softw, vol.28, issue.2, pp.152-205, 2002.

Y. Hida, X. S. Li, and D. H. Bailey, Algorithms for quad-double precision floating point arithmetic, Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pp.155-162, 2001.

D. E. Knuth, The Art of Computer Programming, Seminumerical Algorithms, vol.2, 1997.

K. Matsumoto, N. Nakasato, T. Sakai, H. Yahagi, and S. G. Sedukhin, Multi-level optimization of matrix multiplication for gpu-equipped systems, In: ICCS. Procedia Computer Science, vol.4, pp.342-351, 2011.