An Efficient and High-Speed Implementation of QRD-MGS Algorithm for STAP Application Based on Floating Point FPGAs

Space-Time Adaptive Processing (STAP) can harness the efficacy of interference and clutter significantly. Calculations of the STAP weights involve solving linear equations which require very intensive computations. In this paper, the QR decomposition (QRD) using the modified gram-schmidt (MGS) algorithm is parameterized with vector size to create a trade-off between the hardware resources utilization and computation time. To achieve an efficient floating point structure, the proposed architecture of QRD-MGS algorithm is simulated and implemented in two modes: single-vector and multi-vector. Results show that the multi-vector method can lead to a high-performance design with higher operating frequency, lower power consumption, and less resource utilization than the single-vector method. For example, Modelism simulations show that the decomposition of a [Formula: see text] matrix with vector size of 17 takes 7.86[Formula: see text][Formula: see text]s with the maximum clock frequency of 282[Formula: see text]MHz, for implementation on the Arria10 FPGA. In real STAP applications, the matrix sizes are too large to be fit on FPGAs and the update rate of the weights are high. Therefore, this method can fit any matrix in the contemporary FPGAs with an acceptable update rate.

Download Full-text

Design and Implementation of FPGA-Based High-Performance Floating Point Arithmetic Unit

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.599-601.1465 ◽

2014 ◽

Vol 599-601 ◽

pp. 1465-1469

Author(s):

Xia Qing Tang ◽

Xiang Liu ◽

Jun Qiang Gao ◽

Bo Lin

Keyword(s):

High Performance ◽

Computation Time ◽

Floating Point ◽

Program Optimization ◽

Clock Frequency ◽

Ip Core ◽

Processing Accuracy ◽

Floating Point Unit ◽

Comparative Results ◽

Time Required

Since FPGA processing data, the presence of fixed-point processing accuracy is not high, and IP Core floating point unit and there are some problems in the use of design risk. Based on the improved floating point unit and program optimization algorithm is designed to achieve single-precision floating-point add / subtract, multiply, and divide operations operator. IP Core for floating-point unit design and FPGA development software provides comparative results: both the maximum clock frequency and latency basically unchanged, while the former occupies less hardware resources, to complete a plus / minus, multiply, divide computation time required for the former than the latter were reduced by 46%, 37% and 57%. The program is downloaded to the FPGA chip to get the same results with the simulation results verify the correctness and feasibility of the design.

Download Full-text

Towards A Multi-FPGA Infrared Simulator

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/154851290700400404 ◽

2007 ◽

Vol 4 (4) ◽

pp. 343-355 ◽

Cited By ~ 1

Author(s):

Vinay Sriram ◽

David Kearney

Keyword(s):

Homeland Security ◽

Reconfigurable Computing ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Ccd Camera ◽

Hardware Acceleration ◽

Limiting Factor ◽

Scene Simulation

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.

Download Full-text

VLSI ARCHITECTURE OF PARALLEL MULTIPLIER– ACCUMULATOR BASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2012.1009 ◽

2012 ◽

pp. 40-46

Author(s):

Mr.M.V. Sathish ◽

Mrs. Sailaja

Keyword(s):

Signal Processing ◽

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Clock Frequency ◽

Parallel Multiplier ◽

Hybrid Type ◽

Standard Design ◽

Overall Performance ◽

And Performance

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

Implementation of Embedded Floating Point Arithmetic Units on FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.550.126 ◽

2014 ◽

Vol 550 ◽

pp. 126-136

Author(s):

N. Ramya Rani

Keyword(s):

High Speed ◽

High Performance ◽

Floating Point ◽

Double Precision ◽

Embedded Computing ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Arithmetic Units ◽

Point Arithmetic

:Floating point arithmetic plays a major role in scientific and embedded computing applications. But the performance of field programmable gate arrays (FPGAs) used for floating point applications is poor due to the complexity of floating point arithmetic. The implementation of floating point units on FPGAs consumes a large amount of resources and that leads to the development of embedded floating point units in FPGAs. Embedded applications like multimedia, communication and DSP algorithms use floating point arithmetic in processing graphics, Fourier transformation, coding, etc. In this paper, methodologies are presented for the implementation of embedded floating point units on FPGA. The work is focused with the aim of achieving high speed of computations and to reduce the power for evaluating expressions. An application that demands high performance floating point computation can achieve better speed and density by incorporating embedded floating point units. Additionally this paper describes a comparative study of the design of single precision and double precision pipelined floating point arithmetic units for evaluating expressions. The modules are designed using VHDL simulation in Xilinx software and implemented on VIRTEX and SPARTAN FPGAs.

Download Full-text

Hardware Implementation Study of Particle Tracking Algorithm on FPGAs

Electronics ◽

10.3390/electronics10202546 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2546

Author(s):

Alessandro Gabrielli ◽

Fabrizio Alfonsi ◽

Alberto Annovi ◽

Alessandra Camplani ◽

Alessandro Cerri

Keyword(s):

High Speed ◽

High Performance ◽

Hardware Implementation ◽

High Energy Physics ◽

Medical Image Analysis ◽

Computation Time ◽

High Energy ◽

Spatial Transformation ◽

Flip Flop ◽

Energy Physics

In recent years, the technological node used to implement FPGA devices has led to very high performance in terms of computational capacity and in some applications these can be much more efficient than CPUs or other programmable devices. The clock managers and the enormous versatility of communication technology through digital transceivers place FPGAs in a prime position for many applications. For example, from real-time medical image analysis to high energy physics particle trajectory recognition, where computation time can be crucial, the benefits of using frontier FPGA capabilities are even more relevant. This paper shows an example of FPGA hardware implementation, via a firmware design, of a complex analytical algorithm: The Hough transform. This is a mathematical spatial transformation used here to facilitate on-the-fly recognition of the trajectories of ionising particles as they pass through the so-called tracker apparatus within high-energy physics detectors. This is a general study to demonstrate that this technique is not only implementable via software-based systems, but can also be exploited using consumer hardware devices. In this context the latter are known as hardware accelerators. In this article in particular, the Xilinx UltraScale+ FPGA is investigated as it belongs to one of the frontier family devices on the market. These FPGAs make it possible to reach high-speed clock frequencies at the expense of acceptable energy consumption thanks to the 14 nm technological node used by the vendor. These devices feature a huge number of gates, high-bandwidth memories, transceivers and other high-performance electronics in a single chip, enabling the design of large, complex and scalable architectures. In particular the Xilinx Alveo U250 has been investigated. A target frequency of 250 MHz and a total latency of 30 clock periods have been achieved using only the 17 ÷ 53% of LUTs, the 8 ÷ 12% of DSPs, the 1 ÷ 3% of Block Rams and a Flip Flop occupancy range of 9 ÷ 28%.

Download Full-text

IMPLEMENTATION OF A REDUCED COMPLEXITY HIGH PERFORMANCE DATA ACQUISITION CHIP USING 0.18 MICRON TECHNOLOGY

SYNCHROINFO JOURNAL ◽

10.36724/2664-066x-2021-7-3-22-26 ◽

2021 ◽

Vol 7 (3) ◽

pp. 22-26

Author(s):

Hai P. Le ◽

◽

Aladin Azyegh ◽

Jugdutt Singh ◽

◽

...

Keyword(s):

Low Power ◽

Data Acquisition ◽

High Speed ◽

High Performance ◽

Modern Science ◽

Digital Data ◽

Clock Frequency ◽

Flash Adc ◽

Analog Signals ◽

Wide Range

Data acquisition (DAQ) in the general sense is the process of collecting information from the real world. For engineers and scientists, this data is mostly numerical and is usually collected, stored and analysed using computers. However, most of the input signals cannot be read directly by digital computers. Because they are generally analog signals distinguished by continuous values, while computers can only recognise digital signals containing only the on/off levels. DAQ systems are therefore inevitably necessary, as they include the translation requirements from analog signals to digital data. For this reason, they have become significant in wide range of applications in modern science and technology [1]. The paper precents the disign of a 12-bit high-speed low-power Data Acquisition (DAQ) Chip. In this paper, the disigns of the building block components are aimed at high-accuracy along with high-speed and low power dissipation. A modifided flash Analog-to-Digital converter (ADC) was used instead of the traditional flash proposed DAQ chip operates at 1 GHz master clock frequency and achieves a sampling speed of 125 MS/s. It dissipates only 64.9 mW of power as compared to 97.2 mW when traditional flash ADC was used.

Download Full-text

A modified Fresnel-based algorithm for 3D microwave imaging of metal objects

International Journal of Microwave and Wireless Technologies ◽

10.1017/s175907871800123x ◽

2018 ◽

Vol 11 (4) ◽

pp. 313-325

Author(s):

Farshad Zamiri ◽

Abdolreza Nabavi

Keyword(s):

High Speed ◽

Fourier Transforms ◽

Low Cost ◽

Three Dimensional ◽

Computation Time ◽

Microwave Imaging ◽

Reconstruction Algorithms ◽

Clock Frequency ◽

Pipeline Architecture ◽

The Impact

AbstractMicrowave holography technique reconstructs a target image using recorded amplitudes and phases of the signals reflected from the target with Fast Fourier Transform (FFT)-based algorithms. The reconstruction algorithms have two or more steps of two- and three-dimensional Fourier transforms, which have a high computational load. In this paper, by neglecting the impact of target depth on image reconstruction, an efficient Fresnel-based algorithm is proposed, involving only one-step FFT for both single- and multi-frequency microwave imaging. Numerous tests have been performed to show the effectiveness of the proposed algorithm including planar and non-planar targets, using the raw data gathered by means of a scanner operating in X-band. Finally, a low-cost and high-speed hardware architecture based on fixed-point arithmetic is introduced which reconstructs the planar targets. This pipeline architecture was tested on field programmable gate arrays operating at 200 MHz clock frequency, which illustrates more than 30 times improvement in computation time compared with a computer.

Download Full-text

AFBV: A High-Performance Network Flow Classification Method for Multi-Dimensional Fields and FPGA Implementation

Journal of Circuits System and Computers ◽

10.1142/s0218126619502372 ◽

2019 ◽

Vol 28 (14) ◽

pp. 1950237

Author(s):

Ling Zheng ◽

Zhiliang Qiu ◽

Weina Wang ◽

Weitao Pan ◽

Shiyong Sun ◽

...

Keyword(s):

High Throughput ◽

Network Flow ◽

High Speed ◽

High Performance ◽

Clock Frequency ◽

Pipeline Architecture ◽

Exact Matching ◽

Flow Classification ◽

Rule Sets ◽

Bit Vector

Network flow classification is a key function in high-speed switches and routers. It directly determines the performance of network devices. With the development of the Internet and various kinds of applications, the flow classification needs to support multi-dimensional fields, large rule sets, and sustain a high throughput. Software-based classification cannot meet the performance requirement as high as 100 Gbps. FPGA-based flow classification methods can achieve a very high throughput. However, the range matching is still challenging. For this, this paper proposes a range supported bit vector (RSBV) method. First, the characteristic of range matching is analyzed, then the rules are pre-encoded and stored in memory. Second, the fields of an input packet header are used as addresses to read the memory, and the result of range matching is derived through pipelined Boolean operations. On this basis, bit vector for any types of fields (AFBV) is further proposed, which supports the flow classification for multi-dimensional fields efficiently, including exact matching, longest prefix matching, range matching, and arbitrary wildcard matching. The proposed methods are implemented in FPGA platform. Through a two-dimensional pipeline architecture, the AFBV can operate at a high clock frequency and can achieve a processing speed of more than 100 Gbps. Simulation results show that for a rule set of 512-bit width and 1[Formula: see text]k rules, the AFBV can achieve a throughput of 520 million packets per second (MPPS). The performance is improved by 44% compared with FSBV and 30% compared with Stride BV. The power consumption is reduced by about 43% compared with TCAM solution.

Download Full-text

Design of Low Power CMOS Comparator using 180nm Technology for ADC Application

Circulation in Computer Science ◽

10.22632/ccs-2017-mcsp027 ◽

2017 ◽

Vol MCSP2017 (01) ◽

pp. 11-13

Author(s):

Truptimayee Behera ◽

Ritisnigdha Das

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Input Voltage ◽

Clock Frequency ◽

Nmos Transistor ◽

Body Effect ◽

Low Power Dissipation ◽

Low Power Cmos

In our design of CMOS comparator with high performance using GPDK 180nm technology we optimize these parameters. We analyse the transient response of the schematic design and the gain is calculated in AC analysis and also we measure the power dissipation. The circuit is built by using PMOS and NMOS transistor with a body effect. A plot of phase and gain also discussed in the paper. Finally a test schematic is built and transient analysis for an input voltage of 2V is measured using Cadence virtuoso. Simulation results are presented and it shows that this design can work under high speed clock frequency 200MHz. The design has low power dissipation.

Download Full-text

A FPGA-Based Design of Floating-Point FFT Processor with Dual-Core

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.811.441 ◽

2013 ◽

Vol 811 ◽

pp. 441-446

Author(s):

Jun Ding ◽

Na Li

Keyword(s):

Fourier Transform ◽

High Speed ◽

Complex Multiplication ◽

Floating Point ◽

System Throughput ◽

Cordic Algorithm ◽

Clock Frequency ◽

Sample Number ◽

Fft Processor ◽

Dual Core

This paper presents a dual-core floating point FFT processor design based on CORDIC algorithm, enabling high-speed floating-point real-time FFT computation, and its time complexity is (N / 4) Log (N / 2). The design unifiesthe floating complex multiplication and the evaluationof twiddle factors into an iteration, which not only reduces the complexity of complex multiplication but also reduces the difficulty when the butterfly unit deals with floating-point in fast Fourier transform. The butterfly unit unaffected by the size of external memory can handle the Fourier transform with high sample number, both having wider handling range and high handling precision. It uses two logical cores and pipeline technology to improve overall system throughput, with simple hardware structure and system stability.At the end, it does the post-simulation on the Altera chip EP2C35F672C6, and its timing simulation can be run properly under the 50 MHz clock frequency.

Download Full-text