Skip to Main content Skip to Navigation
New interface
Journal articles

Hierarchical approach for deriving a reproducible unblocked LU factorization

Abstract : We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
Complete list of metadata

Cited literature [28 references]  Display  Hide  Download
Contributor : Roman Iakymchuk Connect in order to contact the contributor
Submitted on : Tuesday, April 18, 2017 - 7:49:53 AM
Last modification on : Friday, August 5, 2022 - 2:41:30 PM
Long-term archiving on: : Wednesday, July 19, 2017 - 12:21:25 PM


Files produced by the author(s)



Roman Iakymchuk, Stef Graillat, David Defour, Enrique S Quintana-Ortí. Hierarchical approach for deriving a reproducible unblocked LU factorization. International Journal of High Performance Computing Applications, 2019, pp.#1094342019832968. ⟨10.1177/1094342019832968⟩. ⟨hal-01419813v4⟩



Record views


Files downloads