Performance and accuracy of the matrix multiplication routines : CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

Philippe Estival; Luc Giraud

Rapport Année : 2010

Performance and accuracy of the matrix multiplication routines : CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

(1) , (2)

1
2

Philippe Estival

Fonction : Auteur
PersonId : 925379

CFD

Luc Giraud

Fonction : Auteur

Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique

Résumé

Scientific computation relies heavily on 64 bits arithmetic. The evolution of the Graphical Processing Units to the status of massively micro-parallel vector units and the improvement of their programmability make them stand as powerfull algebraic coprocessors for many classes of matrix calculus. But on these processors inheriting from architectures dedicated to video processing in the first place, the space for double precision is narrow yet. One building block of dense linear algebra, the GEneralized Matrix Multiply Routine has been considerably accelerated on the GPU. We figure in this paper more details regarding its speed, but first and foremost, accuracy.

Mots clés

GPU GEMM Matrix Multiplication Routines MKL Atlas CUBLAS

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Arithmétique des ordinateurs

Fichier principal

MatmulNumaccCUDA.pdf (196.76 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Philippe Estival : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00699377

Soumis le : lundi 21 mai 2012-06:20:51

Dernière modification le : vendredi 12 mai 2023-03:15:20

Archivage à long terme le : vendredi 30 novembre 2012-11:57:16

Dates et versions

hal-00699377 , version 1 (21-05-2012)

Identifiants

HAL Id : hal-00699377 , version 1

Citer

Philippe Estival, Luc Giraud. Performance and accuracy of the matrix multiplication routines : CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem. 2010. ⟨hal-00699377⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LARA

175 Consultations

3047 Téléchargements

Performance and accuracy of the matrix multiplication routines : CUBLAS on Nvidia Tesla versus MKL and ATLAS on Intel Nehalem

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager