FPGA Implementation of Multiplier for Floating-Point Numbers Based on IEEE 754-2008 Standard

This paper illustrates designing and implementation process of floating point multiplier on Field Programmable Gate Array (FPGA). Floating-point operations are used in many fields like, digital signal processing, digital image processing, multimedia data analysis etc. Implementation of floating-point multiplication is handy and easy for high level language. However it is a challenging task to implement a floating-point multiplication in hardware level/low level language due to the complexity of algorithm. A top-down approach has been applied for the prototyping of IEEE 754-2008 standard floating-point multiplier module using Verilog Hardware Description Language (HDL). Electronic Design Automation (EDA) tool of Altera Quartus II has been used for floating-point multiplier. The hardware implementation has been done by downloading the Verilog code onto Altera DE2 FPGA development board and found a satisfactory performance.

Download Full-text

NULL Convention Floating Point Multiplier

The Scientific World JOURNAL ◽

10.1155/2015/749569 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Anitha Juliette Albert ◽

Seshasayanan Ramachandran

Keyword(s):

High Precision ◽

Dynamic Range ◽

Digital Signal ◽

High Dynamic Range ◽

Floating Point ◽

Single Precision ◽

Critical Part ◽

Point Multiplication ◽

Null Convention Logic ◽

High Dynamic

Floating point multiplication is a critical part in high dynamic range and computational intensive digital signal processing applications which require high precision and low power. This paper presents the design of an IEEE 754 single precision floating point multiplier using asynchronous NULL convention logic paradigm. Rounding has not been implemented to suit high precision applications. The novelty of the research is that it is the first ever NULL convention logic multiplier, designed to perform floating point multiplication. The proposed multiplier offers substantial decrease in power consumption when compared with its synchronous version. Performance attributes of the NULL convention logic floating point multiplier, obtained from Xilinx simulation and Cadence, are compared with its equivalent synchronous implementation.

Download Full-text

Stepwise Transformation of Algorithms into Array Processor Architectures by the DECOMP

VLSI Design ◽

10.1155/1995/76861 ◽

1995 ◽

Vol 3 (1) ◽

pp. 67-80

Author(s):

Uwe Vehlies

Keyword(s):

Digital Signal ◽

Design Flow ◽

Array Processor ◽

Formal Approach ◽

Processor Architectures ◽

Massively Parallel Systems ◽

Signal Processing Algorithms ◽

Hardware Description ◽

High Level ◽

Short Time

A formal approach for the transformation of computation intensive digital signal processing algorithms into suitable array processor architectures is presented. It covers the complete design flow from algorithmic specifications in a high-level programming language to architecture descriptions in a hardware description language. The transformation itself is divided into manageable design steps and implemented in the CAD-tool DECOMP which allows the exploration of different architectures in a short time. With the presented approach data independent algorithms can be mapped onto array processor architectures. To allow this, a known mapping methodology for array processor design is extended to handle inhomogeneous dependence graphs with nonregular data dependences. The implementation of the formal approach in the DECOMP is an important step towards design automation for massively parallel systems.

Download Full-text

HLS Based Approach to Develop an Implementable HDR Algorithm

Electronics ◽

10.3390/electronics7110332 ◽

2018 ◽

Vol 7 (11) ◽

pp. 332 ◽

Cited By ~ 1

Author(s):

Rappy Saha ◽

Partha Banik ◽

Ki-Doo Kim

Keyword(s):

Dynamic Range ◽

Signal To Noise Ratio ◽

Simple Algorithm ◽

Structural Similarity ◽

High Dynamic Range ◽

Field Programmable ◽

Hardware Description ◽

On Chip ◽

High Level ◽

Removal Technique

Hardware suitability of an algorithm can only be verified when the algorithm is actually implemented in the hardware. By hardware, we indicate system on chip (SoC) where both processor and field-programmable gate array (FPGA) are available. Our goal is to develop a simple algorithm that can be implemented on hardware where high-level synthesis (HLS) will reduce the tiresome work of manual hardware description language (HDL) optimization. We propose an algorithm to achieve high dynamic range (HDR) image from a single low dynamic range (LDR) image. We use highlight removal technique for this purpose. Our target is to develop parameter free simple algorithm that can be easily implemented on hardware. For this purpose, we use statistical information of the image. While software development is verified with state of the art, the HLS approach confirms that the proposed algorithm is implementable to hardware. The performance of the algorithm is measured using four no-reference metrics. According to the measurement of the structural similarity (SSIM) index metric and peak signal-to-noise ratio (PSNR), hardware simulated output is at least 98.87 percent and 39.90 dB similar to the software simulated output. Our approach is novel and effective in the development of hardware implementable HDR algorithm from a single LDR image using the HLS tool.

Download Full-text

Design and Implementation of FPU for Optimised Speed

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6444.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3922-3933

Keyword(s):

Energy Efficient ◽

High Speed ◽

Software Tool ◽

Digital Signal ◽

Floating Point ◽

Double Precision ◽

Arithmetic Unit ◽

Single Precision ◽

Point Multiplication ◽

Floating Point Unit

Currently, each CPU has one or additional Floating Point Units (FPUs) integrated inside it. It is usually utilized in math wide-ranging applications, such as digital signal processing. It is found in places be established in engineering, medical and military fields in adding along to in different fields requiring audio, image or video handling. A high-speed and energy-efficient floating point unit is naturally needed in the electronics diligence as an arithmetic unit in microprocessors. The most operations accounting 95% of conformist FPU are multiplication and addition. Many applications need the speedy execution of arithmetic operations. In the existing system, the FPM(Floating Point Multiplication) and FPA(Floating Point Addition) have more delay and fewer speed and fewer throughput. The demand for high speed and throughput intended to design the multiplier and adder blocks within the FPM (Floating point multiplication)and FPA(Floating Point Addition) in a format of single precision floating point and double-precision floating point operation is internally pipelined to achieve high throughput and these are supported by the IEEE 754 standard floating point representations. This is designed with the Verilog code using Xilinx ISE 14.5 software tool is employed to code and verify the ensuing waveforms of the designed code

Download Full-text

FPGA–Based Efficient Hardware/Software Co–Design for Industrial Systems with Consideration of Output Selection

Journal of Electrical Engineering ◽

10.1515/jee-2016-0022 ◽

2016 ◽

Vol 67 (3) ◽

pp. 150-159 ◽

Cited By ~ 1

Author(s):

Kyriakos M. Deliparaschos ◽

Konstantinos Michail ◽

Argyrios C. Zolotas ◽

Spyros G. Tzafestas

Keyword(s):

System Modeling ◽

Robustness Analysis ◽

Sensor Selection ◽

Linear Quadratic ◽

Industrial Systems ◽

Field Programmable ◽

Speed Up ◽

Hardware Description ◽

Selection Framework ◽

High Level

Abstract This work presents a field programmable gate array (FPGA)-based embedded software platform coupled with a software-based plant, forming a hardware-in-the-loop (HIL) that is used to validate a systematic sensor selection framework. The systematic sensor selection framework combines multi-objective optimization, linear-quadratic-Gaussian (LQG)-type control, and the nonlinear model of a maglev suspension. A robustness analysis of the closed-loop is followed (prior to implementation) supporting the appropriateness of the solution under parametric variation. The analysis also shows that quantization is robust under different controller gains. While the LQG controller is implemented on an FPGA, the physical process is realized in a high-level system modeling environment. FPGA technology enables rapid evaluation of the algorithms and test designs under realistic scenarios avoiding heavy time penalty associated with hardware description language (HDL) simulators. The HIL technique facilitates significant speed-up in the required execution time when compared to its software-based counterpart model.

Download Full-text

A FPGA CORE GENERATOR FOR EMBEDDED CLASSIFICATION SYSTEMS

Journal of Circuits System and Computers ◽

10.1142/s0218126611007244 ◽

2011 ◽

Vol 20 (02) ◽

pp. 263-282 ◽

Cited By ~ 21

Author(s):

DAVIDE ANGUITA ◽

LUCA CARLINO ◽

ALESSANDRO GHIO ◽

SANDRO RIDELLA

Keyword(s):

Pattern Recognition ◽

Vision System ◽

Classification Systems ◽

Support Vector ◽

Digital Architecture ◽

Current State ◽

Field Programmable ◽

Hardware Description ◽

High Level ◽

Core Description

We describe in this work a Core Generator for Pattern Recognition tasks. This tool is able to generate, according to user requirements, the hardware description of a digital architecture, which implements a Support Vector Machine, one of the current state-of-the-art algorithms for Pattern Recognition. The output of the Core Generator consists of a high-level language hardware core description, suitable to be mapped on a reconfigurable device, like a Field Programmable Gate Array (FPGA). As an example of the use of our tool, we compare different solutions, by targeting several reconfigurable devices, and implement the recognition part of a machine vision system for automotive applications.

Download Full-text

Comparison of Different Design Alternatives for Hardware-in-the-Loop of Power Converters

Electronics ◽

10.3390/electronics10080926 ◽

2021 ◽

Vol 10 (8) ◽

pp. 926

Author(s):

Elyas Zamiri ◽

Alberto Sanchez ◽

Marina Yushkova ◽

Maria Sofia Martínez-García ◽

Angel de Castro

Keyword(s):

Ad Hoc ◽

Power Converters ◽

General Purpose ◽

Hardware In The Loop ◽

Gate Arrays ◽

Design Alternatives ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

High Level

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effort.

Download Full-text

A Novel Low-Area Point Multiplication Architecture for Elliptic-Curve Cryptography

Electronics ◽

10.3390/electronics10212698 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2698

Author(s):

Muhammad Rashid ◽

Mohammad Mazyad Hazzazi ◽

Sikandar Zulqarnain Khan ◽

Adel R. Alharbi ◽

Asher Sajid ◽

...

Keyword(s):

Elliptic Curve ◽

Elliptic Curve Cryptography ◽

State Of The Art ◽

Critical Path ◽

Clock Frequency ◽

Description Language ◽

Point Multiplication ◽

Low Area ◽

Field Programmable ◽

Hardware Description

This paper presents a Point Multiplication (PM) architecture of Elliptic-Curve Cryptography (ECC) over GF(2163) with a focus on the optimization of hardware resources and latency at the same time. The hardware resources are reduced with the use of a bit-serial (traditional schoolbook) multiplication method. Similarly, the latency is optimized with the reduction in a critical path using pipeline registers. To cope with the pipelining, we propose to reschedule point addition and double instructions, required for the computation of a PM operation in ECC. Subsequently, the proposed architecture over GF(2163) is modeled in Verilog Hardware Description Language (HDL) using Vivado Design Suite. To provide a fair performance evaluation, we synthesize our design on various FPGA (field-programmable gate array) devices. These FPGA devices are Virtex-4, Virtex-5, Virtex-6, Virtex-7, Spartan-7, Artix-7, and Kintex-7. The lowest area (433 FPGA slices) is achieved on Spartan-7. The highest speed is realized on Virtex-7, where our design achieves 391 MHz clock frequency and requires 416 μs for one PM computation (latency). For power, the lowest values are achieved on the Artix-7 (56 μW) and Kintex-7 (61 μW) devices. A ratio of throughput over area value of 4.89 is reached for Virtex-7. Our design outperforms most recent state-of-the-art solutions (in terms of area) with an overhead of latency.

Download Full-text

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit

Electronics ◽

10.3390/electronics10222859 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2859

Author(s):

Mannhee Cho ◽

Youngmin Kim

Keyword(s):

Fixed Point ◽

High Performance ◽

Rapid Development ◽

Digital Signal ◽

Data Type ◽

Data Types ◽

Precision Data ◽

Field Programmable ◽

Point Data ◽

High Level

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/jertcs.2010070102 ◽

2010 ◽

Vol 1 (3) ◽

pp. 24-43

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Field Programmable ◽

Speed Up ◽

Hardware Description ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text