FPGA Implementation of Multiplier for Floating-Point Numbers Based on IEEE 754-2008 Standard

Author(s):  
Muhammad Ibn Ibrahimy

This paper illustrates designing and implementation process of floating point multiplier on Field Programmable Gate Array (FPGA). Floating-point operations are used in many fields like, digital signal processing, digital image processing, multimedia data analysis etc. Implementation of floating-point multiplication is handy and easy for high level language. However it is a challenging task to implement a floating-point multiplication in hardware level/low level language due to the complexity of algorithm. A top-down approach has been applied for the prototyping of IEEE 754-2008 standard floating-point multiplier module using Verilog Hardware Description Language (HDL). Electronic Design Automation (EDA) tool of Altera Quartus II has been used for floating-point multiplier. The hardware implementation has been done by downloading the Verilog code onto Altera DE2 FPGA development board and found a satisfactory performance.

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Anitha Juliette Albert ◽  
Seshasayanan Ramachandran

Floating point multiplication is a critical part in high dynamic range and computational intensive digital signal processing applications which require high precision and low power. This paper presents the design of an IEEE 754 single precision floating point multiplier using asynchronous NULL convention logic paradigm. Rounding has not been implemented to suit high precision applications. The novelty of the research is that it is the first ever NULL convention logic multiplier, designed to perform floating point multiplication. The proposed multiplier offers substantial decrease in power consumption when compared with its synchronous version. Performance attributes of the NULL convention logic floating point multiplier, obtained from Xilinx simulation and Cadence, are compared with its equivalent synchronous implementation.


VLSI Design ◽  
1995 ◽  
Vol 3 (1) ◽  
pp. 67-80
Author(s):  
Uwe Vehlies

A formal approach for the transformation of computation intensive digital signal processing algorithms into suitable array processor architectures is presented. It covers the complete design flow from algorithmic specifications in a high-level programming language to architecture descriptions in a hardware description language. The transformation itself is divided into manageable design steps and implemented in the CAD-tool DECOMP which allows the exploration of different architectures in a short time. With the presented approach data independent algorithms can be mapped onto array processor architectures. To allow this, a known mapping methodology for array processor design is extended to handle inhomogeneous dependence graphs with nonregular data dependences. The implementation of the formal approach in the DECOMP is an important step towards design automation for massively parallel systems.


Electronics ◽  
2018 ◽  
Vol 7 (11) ◽  
pp. 332 ◽  
Author(s):  
Rappy Saha ◽  
Partha Banik ◽  
Ki-Doo Kim

Hardware suitability of an algorithm can only be verified when the algorithm is actually implemented in the hardware. By hardware, we indicate system on chip (SoC) where both processor and field-programmable gate array (FPGA) are available. Our goal is to develop a simple algorithm that can be implemented on hardware where high-level synthesis (HLS) will reduce the tiresome work of manual hardware description language (HDL) optimization. We propose an algorithm to achieve high dynamic range (HDR) image from a single low dynamic range (LDR) image. We use highlight removal technique for this purpose. Our target is to develop parameter free simple algorithm that can be easily implemented on hardware. For this purpose, we use statistical information of the image. While software development is verified with state of the art, the HLS approach confirms that the proposed algorithm is implementable to hardware. The performance of the algorithm is measured using four no-reference metrics. According to the measurement of the structural similarity (SSIM) index metric and peak signal-to-noise ratio (PSNR), hardware simulated output is at least 98.87 percent and 39.90 dB similar to the software simulated output. Our approach is novel and effective in the development of hardware implementable HDR algorithm from a single LDR image using the HLS tool.


Currently, each CPU has one or additional Floating Point Units (FPUs) integrated inside it. It is usually utilized in math wide-ranging applications, such as digital signal processing. It is found in places be established in engineering, medical and military fields in adding along to in different fields requiring audio, image or video handling. A high-speed and energy-efficient floating point unit is naturally needed in the electronics diligence as an arithmetic unit in microprocessors. The most operations accounting 95% of conformist FPU are multiplication and addition. Many applications need the speedy execution of arithmetic operations. In the existing system, the FPM(Floating Point Multiplication) and FPA(Floating Point Addition) have more delay and fewer speed and fewer throughput. The demand for high speed and throughput intended to design the multiplier and adder blocks within the FPM (Floating point multiplication)and FPA(Floating Point Addition) in a format of single precision floating point and double-precision floating point operation is internally pipelined to achieve high throughput and these are supported by the IEEE 754 standard floating point representations. This is designed with the Verilog code using Xilinx ISE 14.5 software tool is employed to code and verify the ensuing waveforms of the designed code


2016 ◽  
Vol 67 (3) ◽  
pp. 150-159 ◽  
Author(s):  
Kyriakos M. Deliparaschos ◽  
Konstantinos Michail ◽  
Argyrios C. Zolotas ◽  
Spyros G. Tzafestas

Abstract This work presents a field programmable gate array (FPGA)-based embedded software platform coupled with a software-based plant, forming a hardware-in-the-loop (HIL) that is used to validate a systematic sensor selection framework. The systematic sensor selection framework combines multi-objective optimization, linear-quadratic-Gaussian (LQG)-type control, and the nonlinear model of a maglev suspension. A robustness analysis of the closed-loop is followed (prior to implementation) supporting the appropriateness of the solution under parametric variation. The analysis also shows that quantization is robust under different controller gains. While the LQG controller is implemented on an FPGA, the physical process is realized in a high-level system modeling environment. FPGA technology enables rapid evaluation of the algorithms and test designs under realistic scenarios avoiding heavy time penalty associated with hardware description language (HDL) simulators. The HIL technique facilitates significant speed-up in the required execution time when compared to its software-based counterpart model.


2011 ◽  
Vol 20 (02) ◽  
pp. 263-282 ◽  
Author(s):  
DAVIDE ANGUITA ◽  
LUCA CARLINO ◽  
ALESSANDRO GHIO ◽  
SANDRO RIDELLA

We describe in this work a Core Generator for Pattern Recognition tasks. This tool is able to generate, according to user requirements, the hardware description of a digital architecture, which implements a Support Vector Machine, one of the current state-of-the-art algorithms for Pattern Recognition. The output of the Core Generator consists of a high-level language hardware core description, suitable to be mapped on a reconfigurable device, like a Field Programmable Gate Array (FPGA). As an example of the use of our tool, we compare different solutions, by targeting several reconfigurable devices, and implement the recognition part of a machine vision system for automotive applications.


Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 926
Author(s):  
Elyas Zamiri ◽  
Alberto Sanchez ◽  
Marina Yushkova ◽  
Maria Sofia Martínez-García ◽  
Angel de Castro

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effort.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2698
Author(s):  
Muhammad Rashid ◽  
Mohammad Mazyad Hazzazi  ◽  
Sikandar Zulqarnain Khan ◽  
Adel R. Alharbi  ◽  
Asher Sajid  ◽  
...  

This paper presents a Point Multiplication (PM) architecture of Elliptic-Curve Cryptography (ECC) over GF(2163) with a focus on the optimization of hardware resources and latency at the same time. The hardware resources are reduced with the use of a bit-serial (traditional schoolbook) multiplication method. Similarly, the latency is optimized with the reduction in a critical path using pipeline registers. To cope with the pipelining, we propose to reschedule point addition and double instructions, required for the computation of a PM operation in ECC. Subsequently, the proposed architecture over GF(2163) is modeled in Verilog Hardware Description Language (HDL) using Vivado Design Suite. To provide a fair performance evaluation, we synthesize our design on various FPGA (field-programmable gate array) devices. These FPGA devices are Virtex-4, Virtex-5, Virtex-6, Virtex-7, Spartan-7, Artix-7, and Kintex-7. The lowest area (433 FPGA slices) is achieved on Spartan-7. The highest speed is realized on Virtex-7, where our design achieves 391 MHz clock frequency and requires 416 μs for one PM computation (latency). For power, the lowest values are achieved on the Artix-7 (56 μW) and Kintex-7 (61 μW) devices. A ratio of throughput over area value of 4.89 is reached for Virtex-7. Our design outperforms most recent state-of-the-art solutions (in terms of area) with an overhead of latency.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2859
Author(s):  
Mannhee Cho ◽  
Youngmin Kim

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.


Author(s):  
Fabio Garzia ◽  
Roberto Airoldi ◽  
Jari Nurmi

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.


Sign in / Sign up

Export Citation Format

Share Document