Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Roman Iakymchuk 1 Stef Graillat 2 David Defour 3 Enrique Quintana-Ortí 4
2 PEQUAN - Performance et Qualité des Algorithmes Numériques
LIP6 - Laboratoire d'Informatique de Paris 6
3 DALI - Digits, Architectures et Logiciels Informatiques
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, UPVD - Université de Perpignan Via Domitia
Abstract : We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.
Liste complète des métadonnées

Cited literature [4 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01382645
Contributor : Roman Iakymchuk <>
Submitted on : Monday, October 17, 2016 - 1:41:58 PM
Last modification on : Thursday, March 21, 2019 - 2:39:12 PM

File

reprolu.abstract.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01382645, version 1

Citation

Roman Iakymchuk, Stef Graillat, David Defour, Enrique Quintana-Ortí. Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs. The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States. ⟨hal-01382645⟩

Share

Metrics

Record views

337

Files downloads

154