VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition

Author(s):  
Chitranjan Singh ◽  
Sushma Prasad ◽  
Poras Balsara
2011 ◽  
Vol 59 (4) ◽  
pp. 1858-1867 ◽  
Author(s):  
Lei Ma ◽  
Kevin Dickson ◽  
John McAllister ◽  
John McCanny

2018 ◽  
Vol 27 (14) ◽  
pp. 1850220 ◽  
Author(s):  
Wei-Yang Chen ◽  
Chung-An Shen

This paper presents the VLSI architecture of a low-latency and high-throughput sorted-QR decomposition (SQRD) engine for multiple-input multiple-output (MIMO) communication systems. In order to achieve a high processing throughput, the proposed design is architected based on a novel pipelined Givens rotation (GR) structure comprising of multi-dimension COordinate rotation DIgital computer (CORDIC) (MD-CORDIC) processing elements (PEs). Moreover, this design delivers the vector norm and conducts the sorting operation as a by-product of the vectoring operation on the execution flow of the CORDIC process. Therefore, excessive overheads for norm-calculation and sorting are excluded, and thus the latency is greatly reduced and throughput is enhanced. In addition, the proposed SQRD engine is operating directly on the complex-valued channel matrix to avoid the matrix augmentation caused by the real-valued decomposition of the channel matrix. This design has been synthesized, placed and routed, and the post-layout estimation results have shown that the processing throughput of the proposed SQRD architecture achieves an approximately 2[Formula: see text] improvement compared to the prior arts.


Author(s):  
Siavash Amin-Nejad ◽  
Katayoon Basharkhah ◽  
Tayyebeh Asgari Gashteroodkhani

A wide variety of digital communication systems are encountered with high computational tasks. QR decomposition is one of such algorithms that can be implemented on FPGAs as a solution to large complex matrix inversion problems. A flexible vector processing architecture for the fixed and floating point implementations of the QR decomposition is presented. The design is implemented on the StratixIV device with 230K logic elements and verified with the SignalTap II built-in logic analyzer. Throughputs of 2.4M and 2.11M decompositions per second with maximum clock frequency of 340 MHz and 360 MHz are achieved for 4×4 matrices with the fixed and floating point designs respectively. The FPGA resource utilizations of the two data type implementations are also compared for different matrix sizes for the StratixIV and Arria10 devices.


2013 ◽  
Vol 10 (9) ◽  
pp. 20130210-20130210 ◽  
Author(s):  
Hong Liang ◽  
He Weifeng ◽  
Zhu Hui ◽  
Mao Zhigang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document