J. Abel, K. Balasubramanian, M. Bargeron, T. Craver, and M. Phlipot, Applications tuning for streaming SIMD extensions, Intel Technology Journal, 1999.

R. Allen and K. Kennedy, Optimizing compilers for modern architectures: a dependence-based approach, 2002.

D. Beymer, P. Mclauchlan, B. Coifman, and J. Malik, A realtime computer vision system for measuring traffic parameters, Computer Vision and Pattern Recognition IEEE Computer Society Conference on, pp.495-501, 1997.

T. Dong, A. Haidar, P. Luszczek, J. A. Harris, S. Tomov et al., LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pp.157-160, 2014.
DOI : 10.1109/HPCC.2014.30

T. Dong, A. Haidar, S. Tomov, and J. Dongarra, A Fast Batched Cholesky Factorization on a GPU, 2014 43rd International Conference on Parallel Processing, pp.432-440, 2014.
DOI : 10.1109/ICPP.2014.52

A. Fog, Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, pp.2016-2017, 2016.

R. Frühwirth, Application of Kalman filtering to track and vertex fitting. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, pp.444-450, 1987.

N. J. Higham, Accuracy and stability of numerical algorithms, SIAM, 2002.
DOI : 10.1137/1.9780898718027

N. J. Higham, Cholesky factorization, Wiley Interdisciplinary Reviews: Computational Statistics, vol.103, issue.2, pp.251-254, 2009.
DOI : 10.1002/wics.18

J. Iliffe, The use of the genie system in numerical calculation, Annual Review in Automatic Programming, vol.2, pp.1-28, 1961.

L. Lacassagne, D. Etiemble, A. Hassan-zahraee, A. Dominguez, and P. Vezolle, High level transforms for SIMD and low-level computer vision algorithms, Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, WPMVP '14, pp.49-56, 2014.
DOI : 10.1145/2568058.2568067

URL : https://hal.archives-ouvertes.fr/hal-01094906

A. Romero, L. Lacassagne, and M. Gouiffès, Real-time covariance tracking algorithm for embedded systems, IEEE International Conference on Design and Architectures for Signal and Image Processing, p.2013

J. Shin, M. W. Hall, J. Chame, C. Chen, and P. D. Hovland, Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology, Software Automatic Tuning, pp.353-370, 2011.
DOI : 10.1007/978-1-4419-6935-4_20