Hierarchical approach for deriving a reproducible unblocked LU factorization
2019 ◽
Vol 33
(5)
◽
pp. 791-803
◽
Keyword(s):
Level 1
◽
We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
Keyword(s):
2013 ◽
Vol 26
(7)
◽
pp. 1408-1431
◽
1989 ◽
Vol 47
◽
pp. 338-339
2021 ◽
pp. 109434202199073
2021 ◽
Vol 18
(9)
◽
pp. 4624
Keyword(s):
Keyword(s):
1998 ◽
Vol 13
(34)
◽
pp. 2731-2742
◽
Keyword(s):