Exploiting Kant and Kimura’s Matrix Inversion Algorithm on FPGA

Author(s):  
Andre Bannwart Perina ◽  
Paulo Matias ◽  
Eduardo Marques ◽  
Vanderlei Bonato ◽  
Joao Miguel Gago Pontes de Brito Lima
2019 ◽  
Vol 8 (2S11) ◽  
pp. 2834-2840

This paper deals with various low complexity algorithms for higher order matrix inversion involved in massive MIMO system precoder design. The performance of massive MIMO systems is optimized by the process of precoding which is divided into linear and nonlinear. Nonlinear precoding techniques are most complex precoding techniques irrespective of its performance. Hence, linear precoding is generally preferred in which the complexity is mainly contributed by matrix inversion algorithm. To solve this issue, Krylov subspace algorithm such as Conjugate Gradient (CG) was considered to be the best choice of replacement for exact matrix inversions. But CG enforces a condition that the matrix needs to be Symmetric Positive Definite (SPD). If the matrix to be inverted is asymmetric then CG fails to converge. Hence in this paper, a novel approach for the low complexity inversion of asymmetric matrices is proposed by applying two different versions of CG algorithms- Conjugate Gradient Squared (CGS) and Bi-conjugate Gradient (Bi-CG). The convergence behavior and BER performance of these two algorithms are compared with the existing CG algorithm. The results show that these two algorithms outperform CG in terms of convergence speed and relative residue.


2018 ◽  
Vol 10 (1) ◽  
pp. 71-92 ◽  
Author(s):  
M. Varalakshmi ◽  
Amit Parashuram Kesarkar ◽  
Daphne Lopez

Attempts to harness the big climate data that come from high-resolution model output and advanced sensors to provide more accurate and rapidly-updated weather prediction, call for innovations in the existing data assimilation systems. Matrix inversion is a key operation in a majority of data assimilation techniques. Hence, this article presents out-of-core CUDA implementation of an iterative method of matrix inversion. The results show significant speed up for even square matrices of size 1024 X 1024 and more, without sacrificing the accuracy of the results. In a similar test environment, the comparison of this approach with a direct method such as the Gauss-Jordan approach, modified to process large matrices that cannot be processed directly within a single kernel call shows that the former is twice as efficient as the latter. This acceleration is attributed to the division-free design and the embarrassingly parallel nature of every sub-task of the algorithm. The parallel algorithm has been designed to be highly scalable when implemented with multiple GPUs for handling large matrices.


1992 ◽  
Vol 16 (12) ◽  
pp. 133-141 ◽  
Author(s):  
I.Ž. Milovanović ◽  
E.I. Milovanović ◽  
M.K. Stojčev

Sign in / Sign up

Export Citation Format

Share Document