Hierarchical approach for deriving a reproducible unblocked LU factorization - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue International Journal of High Performance Computing Applications Année : 2019

Hierarchical approach for deriving a reproducible unblocked LU factorization

Résumé

We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
Fichier principal
Vignette du fichier
reprolu.pdf (456.04 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01419813 , version 1 (20-12-2016)
hal-01419813 , version 2 (03-01-2017)
hal-01419813 , version 3 (02-02-2017)
hal-01419813 , version 4 (18-04-2017)

Identifiants

Citer

Roman Iakymchuk, Stef Graillat, David Defour, Enrique S Quintana-Ortí. Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications, 2019, pp.#1094342019832968. ⟨10.1177/1094342019832968⟩. ⟨hal-01419813v4⟩
659 Consultations
385 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More