VLSI Design and Comparative Analysis of Several Types of Fixed and Simple Precision Floating Point Multipliers

Abimael Jiménez Pérez; Marco Antonio Gurrola Navarro; Víctor Manuel Valenzuela De la Cruz; José Antonio Muñoz Góme; Omar Aguilar Loreto

doi:10.20983/culcyt.2021.1.2.4

VLSI Design and Comparative Analysis of Several Types of Fixed and Simple Precision Floating Point Multipliers

Cultura Científica y Tecnológica ◽

10.20983/culcyt.2021.1.2.4 ◽

2021 ◽

Vol 18 (1) ◽

pp. 1-9

Author(s):

Abimael Jiménez Pérez ◽

Marco Antonio Gurrola Navarro ◽

Víctor Manuel Valenzuela De la Cruz ◽

José Antonio Muñoz Góme ◽

Omar Aguilar Loreto

Keyword(s):

Fixed Point ◽

Integrated Circuits ◽

Arithmetic Operation ◽

Real Life ◽

Vlsi Design ◽

Digital Signal ◽

Cmos Technology ◽

Floating Point ◽

Wallace Tree ◽

Hardware Description

Multiplication is an arithmetic operation that has a meaningful impact on the performance of several real-life applications, such as digital signal and image processing. Analysis and comparison of different types of fixed-point multipliers such as Wallace tree, array, and Booth-2 with truncated and non-truncated versions were included in this design. Fixed-point multipliers were used to design floating-point multipliers through a hardware description language. As a result, area and speed values were analyzed. Booth-2 fixed multiplier with truncation and RCA adders present both the longest delay and the largest area consumption. Wallace tree floating-point multiplier required the smallest area and the shortest delay. The 8-bit versions of fixed-point multipliers were physically synthesized, using the Alliance tools, to obtain the layout of the circuits. The integrated circuits were successfully fabricated in a 0.5-μm CMOS technology.

Download Full-text

Design of delay efficient Booth multiplier using pipelining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11423 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 94

Author(s):

Abhishek Choubey ◽

SPV Subbarao ◽

Shruti B. Choubey

Keyword(s):

Critical Path ◽

Arithmetic Operation ◽

Vlsi Design ◽

Digital Signal ◽

Path Delay ◽

Large Area ◽

Booth Multiplier ◽

Critical Path Delay ◽

Long Latency ◽

Comparison Results

Multiplication is one of the most an essential arithmetic operation used in numerous applications in digital signal processing and communications. These applications need transformations, convolutions and dot products that involve an enormous amount of multiplications of an operand with a constant. Typical examples include wavelet, digital filters, such as FIR or IIR. However, multiplier structures have relatively large area-delay product, long latency and significantly high power consumption compared to other the arithmetic structure. Therefore, low power multiplier design has been always a significant part of DSP structure for VLSI design. The Booth multiplier is promising as the most efficient amongst the others multiplier as it reduces the complexity of considerably than others. In this paper, we have proposed Booth-multiplier using seamless pipelining. Theoretical comparison results show that the proposed Booth multiplier requires less critical path delay compared to traditional Booth multiplier. ASIC simulation results show proposed radix-16 Booth multiplier 13% less critical path delay for word width n=16 and 17% less critical path delay compared for bit width n=32 to best existing radix-16 Booth multiplier.

Download Full-text

Design and Implementation of Compressor based 32-bit Multipliers for MAC Architecture

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8517.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 2007-2014

Keyword(s):

Circuit Design ◽

High Performance ◽

Arithmetic Operation ◽

Digital Signal ◽

Cmos Technology ◽

Digital Circuit ◽

Digital Circuit Design ◽

Unit Design ◽

High Performance Systems ◽

Signal Processors

Arithmetic operations play a major role in digital circuit design like adders, multipliers etc. Multiplication is an important fundamental arithmetic operation in high performance systems such as microprocessor and digital signal processors circuits. Implementation of multipliers using compressor circuit over conventional adders will reduce the number of levels of addition, which will in turn reduces the latency of the multiplier. Multiplier module is most likely the essential part of MAC (Multiplier-Accumulator) unit design. Compressor based multipliers in MAC architecture design results high performance. FPGA and ASIC implementations of 4:2 compressor based 32-bit Wallace and Dadda multipliers can be done by using Xilinx Vivado and Cadence CMOS technology tools. These results are compared with other multiplier designs with respect to area, latency and power dissipation.

Download Full-text

A high-speed fixed width floating-point multiplier using residue logarithmic number system algorithm

International Journal of Electrical Engineering Education ◽

10.1177/0020720918813836 ◽

2018 ◽

Vol 57 (4) ◽

pp. 361-375 ◽

Cited By ~ 2

Author(s):

J Jency Rubia ◽

GA Sathish Kumar

Keyword(s):

Integrated Circuits ◽

High Speed ◽

Large Scale ◽

Digital Signal ◽

Number System ◽

Residue Number System ◽

Floating Point ◽

Hardware Complexity ◽

Logarithmic Number System ◽

Logarithmic Number

The Residue Logarithmic Number System (RLNS) in digital mathematics allows multiplication and division to be performed considerably quickly and more precisely than the extensively used Floating-Point number setups. RLNS in the pitch of large scale integrated circuits, digital signal processing, multimedia, scientific computing and artificial neural network applications have Fixed Width property which has equal number of in and out bit width; hence, these applications need a Fixed Width multiplier. In this paper, a Fixed Width-Floating-Point multiplier based on RLNS was proposed to increase the processing speed. The truncation errors were reduced by using Taylor series. RLNS is the combination of both the residue number system and the logarithmic number system, and uses a table lookup including all bits for expansion. The proposed scheme is effective with regard to speed, area and power utilization in contrast to the design of conservative Floating-Point mathematics designs. Synthesis results were obtained using a Xilinx 14.7 ISE simulator. The area is 16,668 µm2, power is 37 mW, delay is 6.160 ns and truncation error can be lessened by 89% as compared with the direct-truncated multiplier. The proposed Fixed Width RLNS multiplier performs with lesser compensation error and with minimal hardware complexity, particularly as multiplier input bits increment.

Download Full-text

A floating-point to integer C converter with shift reduction for fixed-point digital signal processors

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) ◽

10.1109/icassp.1999.758363 ◽

1999 ◽

Cited By ~ 2

Author(s):

Ki-Il Kum ◽

Jiyang Kang ◽

Wonyong Sung

Keyword(s):

Fixed Point ◽

Digital Signal ◽

Digital Signal Processors ◽

Floating Point ◽

Signal Processors

Download Full-text

Early Output Quasi-Delay-Insensitive Array Multipliers

Electronics ◽

10.3390/electronics8040444 ◽

2019 ◽

Vol 8 (4) ◽

pp. 444 ◽

Cited By ~ 1

Author(s):

Balasubramanian ◽

Maskell ◽

Naayagi ◽

Mastorakis

Keyword(s):

Cycle Time ◽

Arithmetic Operation ◽

Data Communication ◽

Digital Signal ◽

Complementary Metal Oxide Semiconductor ◽

Cmos Technology ◽

Data Representation ◽

Oxide Semiconductor ◽

Array Multiplier ◽

Array Multipliers

Multiplication is a widely used arithmetic operation in microprocessing and digital signal processing applications, and multiplication is realized using a multiplier. This article presents the quasi-delay-insensitive (QDI) early output versions of recently reported indicating asynchronous array multipliers. Delay-insensitive dual-rail encoding is used for data representation and processing, and 4-phase return-to-zero (RTZ) and return-to-one (RTO) handshake protocols are used for data communication. Many QDI array multipliers were realized using a 32/28 nm complementary metal oxide semiconductor (CMOS) technology. Compared to the optimum indicating array multiplier, the proposed optimum early output array multiplier achieves a 6.2% reduction in cycle time and a 7.4% reduction in power-cycle time product (PCTP) with respect to RTZ handshaking, and a 7.6% reduction in cycle time and an 8.8% reduction in PCTP with respect to RTO handshaking without an increase in the area. The simulation results also convey that the RTO handshaking is preferable to the RTZ handshaking for the optimum implementation of QDI array multipliers.

Download Full-text

An Area Efficient Wallace Tree Multiplier using Modified Full Adder

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8814.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3383-3386

Keyword(s):

Integrated Circuits ◽

High Speed ◽

Digital Signal ◽

Full Adder ◽

Nand Gate ◽

Silicon Area ◽

Wallace Tree ◽

Multiplication Process ◽

Wallace Tree Multiplier ◽

Area Efficient

Multipliers play a significant task in digital signal processing applications and application-specific integrated circuits. Wallace tree multipliers provide a high-speed multiplication process with an area-efficient strategy. It is realized in hardware using full adders and half adders. The optimization of adders can further improve the performance of multipliers. Wallace tree multiplier with modified full adder using NAND gate is proposed to achieve reduced silicon area, high speed and low power consumption. The conventional full adder implemented by XOR, AND, OR gates is replaced by the modified full adder realized using NAND gate. The proposed Wallace tree multiplier includes 544 transistors, while the conventional Wallace tree multiplier has 584 transistors for 4-bit multiplication.

Download Full-text

Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies

Advances in Radio Science ◽

10.5194/ars-10-207-2012 ◽

2012 ◽

Vol 10 ◽

pp. 207-213 ◽

Cited By ~ 3

Author(s):

U. Vishnoi ◽

T. G. Noll

Keyword(s):

Fixed Point ◽

Circuit Simulation ◽

Cmos Technology ◽

Floating Point ◽

Clock Frequency ◽

Worst Case ◽

Silicon Area ◽

Static Power ◽

Baseband Processing ◽

Digital Baseband

Abstract. The COordinate Rotate DIgital Computer (CORDIC) algorithm is a well known versatile approach and is widely applied in today's SoCs for especially but not restricted to digital communications. Dedicated CORDIC blocks can be implemented in deep sub-micron CMOS technologies at very low area and energy costs and are attractive to be used as hardware accelerators for Application Specific Instruction Processors (ASIPs). Thereby, overcoming the well known energy vs. flexibility conflict. Optimizing Global Navigation Satellite System (GNSS) receivers to reduce the hardware complexity is an important research topic at present. In such receivers CORDIC accelerators can be used for digital baseband processing (fixed-point) and in Position-Velocity-Time estimation (floating-point). A micro architecture well suited to such applications is presented. This architecture is parameterized according to the wordlengths as well as the number of iterations and can be easily extended for floating point data format. Moreover, area can be traded for throughput by partially or even fully unrolling the iterations, whereby the degree of pipelining is organized with one CORDIC iteration per cycle. From the architectural description, the macro layout can be generated fully automatically using an in-house datapath generator tool. Since the adders and shifters play an important role in optimizing the CORDIC block, they must be carefully optimized for high area and energy efficiency in the underlying technology. So, for this purpose carry-select adders and logarithmic shifters have been chosen. Device dimensioning was automatically optimized with respect to dynamic and static power, area and performance using the in-house tool. The fully sequential CORDIC block for fixed-point digital baseband processing features a wordlength of 16 bits, requires 5232 transistors, which is implemented in a 40-nm CMOS technology and occupies a silicon area of 1560 μm2 only. Maximum clock frequency from circuit simulation of extracted netlist is 768 MHz under typical, and 463 MHz under worst case technology and application corner conditions, respectively. Simulated dynamic power dissipation is 0.24 uW MHz−1 at 0.9 V; static power is 38 uW in slow corner, 65 uW in typical corner and 518 uW in fast corner, respectively. The latter can be reduced by 43% in a 40-nm CMOS technology using 0.5 V reverse-backbias. These features are compared with the results from different design styles as well as with an implementation in 28-nm CMOS technology. It is interesting that in the latter case area scales as expected, but worst case performance and energy do not scale well anymore.

Download Full-text

Design of Low Power and High Speed 4X4 Multiplier using Modified Column Bypassing Scheme for DSP Applications

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1118.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 643-647

Keyword(s):

Low Power ◽

Digital Signal Processor ◽

High Speed ◽

Vlsi Design ◽

Digital Signal ◽

Cmos Technology ◽

Optimization Techniques ◽

Operating Voltage ◽

Booth Encoder ◽

Dsp Applications

In this paper a low power and high speed 4X4 multiplier is designed using CMOS Technology. The important factors in VLSI Design are power, area, speed and design time. Now-a-days, power and speed has become a crucial factor in Digital Signal Processor (DSP) Applications. However, different optimization techniques are available in the digital electronic world. The proposed approach a Low power and high speed Multiplier Design based on Modified Column bypassing technique mainly used to reduce the switching power activity. While this technique offers great dynamic power savings, due to their interconnection. In this work, a low power and high speed multiplier with Hybridization scheme is presented. This scheme is combination of booth encoder algorithm and column bypass technique is called modified column bypassing scheme. The simulations are performed in 0.18µm CMOS Technology in Cadence Virtuoso tools with operating voltage ±1.8v

Download Full-text

An Efficient VLSI Design of 32X32 bit Multiplier using Wallace Tree Algorithm in Vivado HLS and Xilinx ISE Software using VHDL

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5299.059720 ◽

2020 ◽

Vol 9 (7) ◽

pp. 490-495

Keyword(s):

Power Dissipation ◽

Circuit Complexity ◽

Vlsi Design ◽

Digital Signal ◽

Major Drawback ◽

Tree Algorithm ◽

Wallace Tree ◽

Look Ahead ◽

Ripple Carry Adder ◽

Carry Select Adder

Multiplier is the most basic component present in any digital system. These multipliers are mainly used in Digital Signal and Image Processing applications. In applications like image detection latest sophisticated algorithms like CNN are used which contains MAC units in their design. The multiplier used in MAC unit requires huge memory, offers high latency and consumes more power. There are many algorithms such as Combinational, Sequential and Array Multiplication Algorithms which helps in designing Multiplier. The major drawback in all designs is circuit complexity. The problem of latency and power dissipation are also present. Considering all the drawbacks present in those algorithms this paper proposes the usage of Wallace Tree Algorithm which consumes less power and has low latency. Also, there are many ways to add the final stage of partial products generated such as Carry Look Ahead adder, Carry Select Adder etc. This paper uses both Carry Select Adder and Ripple Carry Adder for performing final addition of partial products. All previous partial products are added using Half adders and Full adders. The Multiplier is designed using VHDL in Xilinx ISE and Vivado Platform.

Download Full-text

Performance of Improved speed pipelined floating point multiplier Architecture

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8055.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6018-6021

Keyword(s):

Fixed Point ◽

Digital Signal ◽

Floating Point ◽

Chip Area ◽

Point Representation ◽

Speed Up ◽

Hardware Component ◽

Dsp Systems ◽

Analytical Approaches ◽

Range Of Values

Multiplier is a hardware component which usually covers an important chip area and must be reduced to create lots of functions in which multiplier frames shape an essential structure, including digital signal processing (DSP) systems and analytical approaches. The benefit of floating point representation across a fixed point (and integer) view is that a wider range of values can be represented. Since floating point numbers are stored in sign-magnitude type, the multiplier also requires unwritten integer numbers and standardization. The multiplier with the algorithm Revised Booth and save adder is one way to speed up the multiplier. The algorithm of Revised Booth reduces the number of incomplete products to create and is regarded as the quickest algorithm of propagation.

Download Full-text