Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Roman S Iakymchuk; Stef S Graillat; David Defour; Enrique S Quintana-Ortí

Communication Dans Un Congrès Année : 2016

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

(1) , (2) , (3) , (4)

1
2
3
4

Roman S Iakymchuk

Fonction : Auteur
PersonId : 966
IdHAL : roman-iakymchuk
IdRef : 253135079

KTH Royal Institute of Technology [Stockholm]

Stef S Graillat

Fonction : Auteur
PersonId : 5653
IdHAL : stef-graillat
IdRef : 104060735

Performance et Qualité des Algorithmes Numériques

David Defour

Fonction : Auteur
PersonId : 4651
IdHAL : david-defour
ORCID : 0000-0001-9923-2394
IdRef : 104542454

Digits, Architectures et Logiciels Informatiques

Enrique S Quintana-Ortí

Fonction : Auteur

Universitat Jaume I = Jaume I University

Résumé

We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.

Mots clés

LU factorization BLAS reproducibility accuracy long accumulator error-free transformation GPUs

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Arithmétique des ordinateurs Analyse numérique [math.NA]

Fichier principal

reprolu.abstract.pdf (162.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Roman Iakymchuk : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01382645

Soumis le : lundi 17 octobre 2016-13:41:58

Dernière modification le : lundi 8 avril 2024-15:09:24

Dates et versions

hal-01382645 , version 1 (17-10-2016)

Identifiants

HAL Id : hal-01382645 , version 1

Citer

Roman S Iakymchuk, Stef S Graillat, David Defour, Enrique S Quintana-Ortí. Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs. The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States. ⟨hal-01382645⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS UNIV-PERP LIP6 DALI LIRMM TDS-MACS MIPS UNIV-MONTPELLIER SORBONNE-UNIVERSITE SU-SCIENCES

319 Consultations

108 Téléchargements

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager