Magma, matrix algebra on gpu and multicore architectures ,
Eigen v3, 2016. ,
Real-time covariance tracking algorithm for embedded systems, Design and Architectures for Signal and Image Processing (DASIP), 2013 Conference on, pp.104-111, 2013. ,
Application of Kalman filtering to track and vertex fitting Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, pp.444-450, 1987. ,
A real-time computer vision system for measuring traffic parameters, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.495-501, 1997. ,
DOI : 10.1109/CVPR.1997.609371
Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology, Software Automatic Tuning, pp.353-370, 2011. ,
DOI : 10.1007/978-1-4419-6935-4_20
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors, Scientific Programming, pp.1-14, 2015. ,
DOI : 10.1007/978-3-642-30961-8_5
High-Performance Matrix-Matrix Multiplications of Very Small Matrices, European Conference on Parallel Processing, pp.659-671, 2016. ,
DOI : 10.1109/ICPPW.2012.39
URL : https://hal.archives-ouvertes.fr/hal-01409286
A Fast Batched Cholesky Factorization on a GPU, 2014 43rd International Conference on Parallel Processing, pp.432-440, 2014. ,
DOI : 10.1109/ICPP.2014.52
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.637.5351
Accuracy and stability of numerical algorithms, SIAM, 2002. ,
DOI : 10.1137/1.9780898718027
Cholesky factorization, Wiley Interdisciplinary Reviews: Computational Statistics, vol.103, issue.2, pp.251-254, 2009. ,
DOI : 10.1137/1.9781611971484
LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS), pp.157-160, 2014. ,
DOI : 10.1109/HPCC.2014.30
High level transforms for SIMD and low-level computer vision algorithms, Proceedings of the 2014 Workshop on Workshop on programming models for SIMD/Vector processing, WPMVP '14, pp.49-56, 2014. ,
DOI : 10.1145/2568058.2568067
URL : https://hal.archives-ouvertes.fr/hal-01094906
Metaprogramming Dense Linear Algebra Solvers Applications to Multi and Many-Core Architectures, 2015 IEEE Trustcom/BigDataSE/ISPA, pp.69-76, 2015. ,
DOI : 10.1109/Trustcom.2015.614
URL : https://hal.archives-ouvertes.fr/hal-01221358
Applications tuning for streaming SIMD extensions, Intel Technology Journal, vol.2, 1999. ,
The use of the genie system in numerical calculation, Annual Review in Automatic Programming, vol.2, pp.1-28, 1961. ,
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs, pp.2016-2017, 2016. ,
Area and performance tradeoffs in floating-point divide and square-root implementations, ACM Computing Surveys, vol.28, issue.3, pp.518-564, 1996. ,
DOI : 10.1145/243439.243481
METHODS OF COMPUTING VALUES OF POLYNOMIALS, Russian Mathematical Surveys, vol.21, issue.1, pp.105-136, 1966. ,
DOI : 10.1070/RM1966v021n01ABEH004147