HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs

Roman Iakymchuk 1 Stef Graillat 2 David Defour 3 Enrique Quintana-Ortí 4
2 PEQUAN - Performance et Qualité des Algorithmes Numériques
LIP6 - Laboratoire d'Informatique de Paris 6
3 DALI - Digits, Architectures et Logiciels Informatiques
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, UPVD - Université de Perpignan Via Domitia
Abstract : We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we provide Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via inexpensive iterative refinement. Following a bottom-up approach, we finally construct a reproducible implementation of the LU factorization for GPUs, which can easily accommodate partial pivoting for stability and be eventually integrated into a (blocked) high performance and stable algorithm for the LU factorization.
Complete list of metadata

Cited literature [4 references]  Display  Hide  Download

Contributor : Roman Iakymchuk Connect in order to contact the contributor
Submitted on : Monday, October 17, 2016 - 1:41:58 PM
Last modification on : Tuesday, November 16, 2021 - 4:42:49 AM


Files produced by the author(s)


  • HAL Id : hal-01382645, version 1


Roman Iakymchuk, Stef Graillat, David Defour, Enrique Quintana-Ortí. Hierarchical Approach for Deriving a Reproducible LU factorization on GPUs. The Numerical Reproducibility at Exascale (NRE16) workshop held as part of the Supercomputing Conference (SC16), Nov 2016, Salt Lake City, UT, United States. ⟨hal-01382645⟩



Record views


Files downloads