Approximate Array Multipliers

Padmanabhan Balasubramanian; Raunaq Nayar; Douglas L. Maskell

doi:10.3390/electronics10050630

Approximate Array Multipliers

Electronics ◽

10.3390/electronics10050630 ◽

2021 ◽

Vol 10 (5) ◽

pp. 630

Author(s):

Padmanabhan Balasubramanian ◽

Raunaq Nayar ◽

Douglas L. Maskell

Keyword(s):

Critical Path ◽

Complementary Metal Oxide Semiconductor ◽

Oxide Semiconductor ◽

Total Power ◽

Path Delay ◽

Input And Output ◽

Critical Path Delay ◽

Standard Design ◽

Array Multiplier ◽

Array Multipliers

This article describes the design of approximate array multipliers by making vertical or horizontal cuts in an accurate array multiplier followed by different input and output assignments within the multiplier. We consider a digital image denoising application and show how different combinations of input and output assignments in an approximate array multiplier affect the quality of the denoised images. We consider the accurate array multiplier and several approximate array multipliers for synthesis. The multipliers were described in Verilog hardware description language and synthesized by Synopsys Design Compiler using a 32/28-nm complementary metal-oxide-semiconductor technology. The results show that compared to the accurate array multiplier, one of the proposed approximate array multipliers viz. PAAM01-V7 achieves a 28% reduction in critical path delay, 75.8% reduction in power, and 64.6% reduction in area while enabling the production of a denoised image that is comparable in quality to the image denoised using the accurate array multiplier. The standard design metrics such as critical path delay, total power dissipation, and area of the accurate and approximate multipliers are given, the error parameters of the approximate array multipliers are provided, and the original image, the noisy image, and the denoised images are also depicted for comparison.

Download Full-text

Early Output Quasi-Delay-Insensitive Array Multipliers

Electronics ◽

10.3390/electronics8040444 ◽

2019 ◽

Vol 8 (4) ◽

pp. 444 ◽

Cited By ~ 1

Author(s):

Balasubramanian ◽

Maskell ◽

Naayagi ◽

Mastorakis

Keyword(s):

Cycle Time ◽

Arithmetic Operation ◽

Data Communication ◽

Digital Signal ◽

Complementary Metal Oxide Semiconductor ◽

Cmos Technology ◽

Data Representation ◽

Oxide Semiconductor ◽

Array Multiplier ◽

Array Multipliers

Multiplication is a widely used arithmetic operation in microprocessing and digital signal processing applications, and multiplication is realized using a multiplier. This article presents the quasi-delay-insensitive (QDI) early output versions of recently reported indicating asynchronous array multipliers. Delay-insensitive dual-rail encoding is used for data representation and processing, and 4-phase return-to-zero (RTZ) and return-to-one (RTO) handshake protocols are used for data communication. Many QDI array multipliers were realized using a 32/28 nm complementary metal oxide semiconductor (CMOS) technology. Compared to the optimum indicating array multiplier, the proposed optimum early output array multiplier achieves a 6.2% reduction in cycle time and a 7.4% reduction in power-cycle time product (PCTP) with respect to RTZ handshaking, and a 7.6% reduction in cycle time and an 8.8% reduction in PCTP with respect to RTO handshaking without an increase in the area. The simulation results also convey that the RTO handshaking is preferable to the RTZ handshaking for the optimum implementation of QDI array multipliers.

Download Full-text

Hardware Optimized and Error Reduced Approximate Adder

Electronics ◽

10.3390/electronics8111212 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1212 ◽

Cited By ~ 6

Author(s):

Padmanabhan Balasubramanian ◽

Douglas L. Maskell

Keyword(s):

Integrated Circuit ◽

Critical Path ◽

Oxide Semiconductor ◽

Average Error ◽

Cmos Process ◽

Path Delay ◽

Clock Period ◽

Design Metrics ◽

Critical Path Delay ◽

Reduction In Area

This paper presents a new hardware optimized and error reduced approximate adder (HOERAA), which is suitable for field programmable gate array (FPGA)- and application specific integrated circuit (ASIC)-based implementations. In this work, we consider a FPGA-based implementation using Xilinx Vivado 2018.3, targeting an Artix-7 FPGA. The ASIC-based realizations are based on a 32/28nm complementary metal oxide semiconductor (CMOS) process. Based on FPGA implementations, we note the following: (i) For 32-bit addition involving a 8-bit least significant inaccurate sub-adder, HOERAA requires 22% fewer look-up tables (LUTs) and 18.6% fewer registers while reducing the minimum clock period by 7.1% and reducing the power-delay product (PDP) by 14.7%, compared to the native accurate FPGA adder, and (ii) for 64-bit addition involving a 8-bit least significant inaccurate sub-adder, HOERAA requires 11% fewer LUTs and 9.3% fewer registers while reducing the minimum clock period by 8.3% and reducing the PDP by 9.3%, compared to the native accurate FPGA adder. Based on ASIC-style implementations, HOERAA is found to achieve the following reductions in design metrics compared to an optimum accurate carry-lookahead adder: (i) A 15.7% reduction in critical path delay, a 21.4% reduction in area, and a 35% reduction in PDP for 32-bit addition involving a 8-bit least significant inaccurate sub-adder, and (ii) a 15.3% reduction in critical path delay, a 10.7% reduction in area, and a 20% reduction in PDP for 64-bit addition involving a 8-bit least significant inaccurate sub-adder. Moreover, comparisons with other approximate adders show that HOERAA has a significantly reduced average error, mean average error, and root mean square error, while reporting near optimum design metrics.

Download Full-text

Layout-Aware Critical Path Delay Test Under Maximum Power Supply Noise Effects

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/tcad.2011.2163159 ◽

2011 ◽

Vol 30 (12) ◽

pp. 1923-1934 ◽

Cited By ~ 18

Author(s):

Junxia Ma ◽

Mohammad Tehranipoor

Keyword(s):

Power Supply ◽

Critical Path ◽

Maximum Power ◽

Path Delay ◽

Power Supply Noise ◽

Delay Test ◽

Noise Effects ◽

Critical Path Delay ◽

Supply Noise ◽

Path Delay Test

Download Full-text

Exploring Linear Structures of Critical Path Delay Faults to Reduce Test Efforts

2006 IEEE/ACM International Conference on Computer Aided Design ◽

10.1109/iccad.2006.320072 ◽

2006 ◽

Author(s):

Shun-yen Lu ◽

Pei-ying Hsieh ◽

Jing-jia Liou

Keyword(s):

Critical Path ◽

Delay Faults ◽

Path Delay ◽

Path Delay Faults ◽

Linear Structures ◽

Critical Path Delay

Download Full-text

A design methodology for approximate multipliers in convolutional neural networks: A case of MNIST

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i1.pp1-10 ◽

2021 ◽

Vol 10 (1) ◽

pp. 1

Author(s):

Kenta Shirane ◽

Takahiro Yamamoto ◽

Hiroyuki Tomiyama

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Design Methodology ◽

Critical Path ◽

High Accuracy ◽

Path Delay ◽

Trade Off ◽

Critical Path Delay

In this paper, we present a case study on approximate multipliers for MNIST Convolutional Neural Network (CNN). We apply approximate multipliers with different bit-width to the convolution layer in MNIST CNN, evaluate the accuracy of MNIST classification, and analyze the trade-off between approximate multiplier’s area, critical path delay and the accuracy. Based on the results of the evaluation and analysis, we propose a design methodology for approximate multipliers. The approximate multipliers consist of some partial products, which are carefully selected according to the CNN input. With this methodology, we further reduce the area and the delay of the multipliers with keeping high accuracy of the MNIST classification.

Download Full-text

Design of delay efficient Booth multiplier using pipelining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11423 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 94

Author(s):

Abhishek Choubey ◽

SPV Subbarao ◽

Shruti B. Choubey

Keyword(s):

Critical Path ◽

Arithmetic Operation ◽

Vlsi Design ◽

Digital Signal ◽

Path Delay ◽

Large Area ◽

Booth Multiplier ◽

Critical Path Delay ◽

Long Latency ◽

Comparison Results

Multiplication is one of the most an essential arithmetic operation used in numerous applications in digital signal processing and communications. These applications need transformations, convolutions and dot products that involve an enormous amount of multiplications of an operand with a constant. Typical examples include wavelet, digital filters, such as FIR or IIR. However, multiplier structures have relatively large area-delay product, long latency and significantly high power consumption compared to other the arithmetic structure. Therefore, low power multiplier design has been always a significant part of DSP structure for VLSI design. The Booth multiplier is promising as the most efficient amongst the others multiplier as it reduces the complexity of considerably than others. In this paper, we have proposed Booth-multiplier using seamless pipelining. Theoretical comparison results show that the proposed Booth multiplier requires less critical path delay compared to traditional Booth multiplier. ASIC simulation results show proposed radix-16 Booth multiplier 13% less critical path delay for word width n=16 and 17% less critical path delay compared for bit width n=32 to best existing radix-16 Booth multiplier.

Download Full-text

Modified RS encoder architecture with reduced critical path delay for high speed data communication

2017 International Conference on Intelligent Sustainable Systems (ICISS) ◽

10.1109/iss1.2017.8389244 ◽

2017 ◽

Author(s):

A. Deepa ◽

C. N. Marimuthu

Keyword(s):

High Speed ◽

Critical Path ◽

Data Communication ◽

Path Delay ◽

Critical Path Delay ◽

High Speed Data

Download Full-text

Delay Modeling and Critical-Path Delay Calculation for MTCMOS Circuits

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1093/ietfec/e89-a.12.3482 ◽

2006 ◽

Vol E89-A (12) ◽

pp. 3482-3490

Author(s):

N. OHKUBO ◽

K. USAMI

Keyword(s):

Critical Path ◽

Path Delay ◽

Critical Path Delay

Download Full-text

On the Use of ZBDDs for Implicit and Compact Critical Path Delay Fault Test Generation

Journal of Electronic Testing ◽

10.1007/s10836-007-5020-8 ◽

2008 ◽

Vol 24 (1-3) ◽

pp. 203-222 ◽

Cited By ~ 3

Author(s):

Kyriakos Christou ◽

Maria K. Michael ◽

Spyros Tragoudas

Keyword(s):

Test Generation ◽

Critical Path ◽

Path Delay ◽

Delay Fault ◽

Critical Path Delay ◽

Path Delay Fault

Download Full-text

Lifting-Based Fractional Wavelet Filter: Energy-Efficient DWT Architecture for Low-Cost Wearable Sensors

Advances in Multimedia ◽

10.1155/2020/8823689 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Mohd Tausif ◽

Ekram Khan ◽

Mohd Hasan ◽

Martin Reisslein

Keyword(s):

Energy Efficient ◽

State Of The Art ◽

Critical Path ◽

Low Cost ◽

Wearable Sensors ◽

Discrete Wavelet ◽

Wavelet Filter ◽

Path Delay ◽

Memory Requirement ◽

Critical Path Delay

This paper proposes and evaluates the LFrWF, a novel lifting-based architecture to compute the discrete wavelet transform (DWT) of images using the fractional wavelet filter (FrWF). In order to reduce the memory requirement of the proposed architecture, only one image line is read into a buffer at a time. Aside from an LFrWF version with multipliers, i.e., the LFr WF m , we develop a multiplier-less LFrWF version, i.e., the LFr WF ml , which reduces the critical path delay (CPD) to the delay T a of an adder. The proposed LFr WF m and LFr WF ml architectures are compared in terms of the required adders, multipliers, memory, and critical path delay with state-of-the-art DWT architectures. Moreover, the proposed LFr WF m and LFr WF ml architectures, along with the state-of-the-art FrWF architectures (with multipliers (Fr WF m ) and without multipliers (Fr WF ml )) are compared through implementation on the same FPGA board. The LFr WF m requires 22% less look-up tables (LUT), 34% less flip-flops (FF), and 50% less compute cycles (CC) and consumes 65% less energy than the Fr WF m . Also, the proposed LFr WF ml architecture requires 50% less CC and consumes 43% less energy than the Fr WF ml . Thus, the proposed LFr WF m and LFr WF ml architectures appear suitable for computing the DWT of images on wearable sensors.

Download Full-text