Hardware Acceleration for Finite Element Electromagnetics: Efficient Sparse Matrix Floating-Point Computations with Field Programmable Gate Arrays

Due to the exponential increase of electronic devices that are connected to the Internet, the amount of data that they produce have grown to the same extent. In order to face the processing of these data, the use of some automatic learning algorithms, also known as Machine Learning, has become widespread. The most popular is the one known as neural networks. These algorithms need a great deal of resources to compute all their operations, and because of that, they have been traditionally implemented in application specific integrated circuits. However, recently there have been a boom in implementations in field programmable gate arrays, also known as FPGAs. These allow greater parallelism in the implementation of the algorithms. Field Programmable Gate Arrays (FPGA) implementation based feature extraction method is proposed in this paper. This particular application is handwritten offline digit recognition. The classification depends on simple 2 layer MultiLayer Perceptron (MLP). The particular feature extraction approach is suitable for execution of FPGA because it is utilized with subtraction and addition operations. From Standard database handwritten digit images of normalized 40×40 pixel the features are extracted by the proposed method. It has been discovered by experiential outcomes that 85% accuracy is achieved by proposed system. Overall, as compared to other systems, it is less complex, more accurate and simple. Further this project explains IEE-754 format single precision floating point MAC unit’s FPGA implementation which is utilized for feeding the neurons weighted inputs in artificial neural networks. Data representation range is improved by floating point numbers utilization to a higher number from smaller number that is highly suggested for Artificial Neuron Network. The code is developed in HDL, simulated and synthesis results are extracted using Xilinx synthesis tools .In order to validate its computational accuracy of the FFT, an MATLAB validation script is used to verify the output of HDL with standard reference model.

Download Full-text

Low‐precision DSP‐based floating‐point multiply‐add fused for Field Programmable Gate Arrays

IET Computers & Digital Techniques ◽

10.1049/iet-cdt.2013.0128 ◽

2014 ◽

Vol 8 (4) ◽

pp. 187-197 ◽

Cited By ~ 4

Author(s):

Alexandru Amaricai ◽

Oana Boncalo ◽

Constantina‐Elena Gavriliu

Keyword(s):

Field Programmable Gate Arrays ◽

Floating Point ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

RTN: Reparameterized Ternary Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5912 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4780-4787

Author(s):

Yuhang Li ◽

Xin Dong ◽

Sai Qian Zhang ◽

Haoli Bai ◽

Yuanpeng Chen ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Accuracy Improvement ◽

Gate Arrays ◽

Resource Limited ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Speed Up

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.

Download Full-text

The Use of Field-Programmable Gate Arrays for the Hardware Acceleration of Design Automation Tasks

VLSI Design ◽

10.1155/1996/17505 ◽

1996 ◽

Vol 4 (2) ◽

pp. 135-139 ◽

Cited By ~ 2

Author(s):

Neil J. Howard ◽

Andrew M. Tyrrell ◽

Nigel M. Allinson

Keyword(s):

Design Process ◽

Low Cost ◽

Vlsi Design ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Design Rule ◽

Gate Arrays ◽

Field Programmable ◽

Design Rule Checking ◽

Programmable Gate Arrays

This paper investigates the possibility of using Field-Programmable Gate Arrays (Fpgas) as reconfigurable co-processors for workstations to produce moderate speedups for most tasks in the design process, resulting in a worthwhile overall design process speedup at low cost and allowing algorithm upgrades with no hardware modification. The use of Fpgas as hardware accelerators is reviewed and then achievable speedups are predicted for logic simulation and VLSI design rule checking tasks for various Fpga co-processor arrangements.

Download Full-text

Field programmable gate arrays and floating point arithmetic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/92.311646 ◽

1994 ◽

Vol 2 (3) ◽

pp. 365-367 ◽

Cited By ~ 36

Author(s):

B. Fagin ◽

C. Renard

Keyword(s):

Field Programmable Gate Arrays ◽

Floating Point ◽

Gate Arrays ◽

Floating Point Arithmetic ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Point Arithmetic

Download Full-text

Flexible multi-mode embedded floating-point unit for field programmable gate arrays

Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '09 ◽

10.1145/1508128.1508155 ◽

2009 ◽

Cited By ~ 11

Author(s):

Yee Jern Chong ◽

Sri Parameswaran

Keyword(s):

Field Programmable Gate Arrays ◽

Floating Point ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Floating Point Unit ◽

Multi Mode

Download Full-text

Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-Enabled Field-Programmable Gate Arrays

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3476229 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-35

Author(s):

Tom Hogervorst ◽

Răzvan Nane ◽

Giacomo Marchiori ◽

Tong Dong Qiu ◽

Markus Blatt ◽

...

Keyword(s):

High Performance ◽

Scientific Computing ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Gate Arrays ◽

Computational Flow Dynamics ◽

Field Programmable ◽

Programmable Gate Arrays ◽

High Bandwidth ◽

Reservoir Simulator

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 10 5 cells and using three unknowns per cell.

Download Full-text