critical path delay
Recently Published Documents


TOTAL DOCUMENTS

45
(FIVE YEARS 17)

H-INDEX

6
(FIVE YEARS 2)

Author(s):  
Gundugonti Kishore Kumar ◽  
Balaji Narayanam

In this paper, a modified finite impulse response (FIR) filter design has been proposed for the denoising bio-electrical signals like Electrooculography(EOG). The proposed filter architecture uses modified multiplier block, which is implemented using modified Radix-[Formula: see text] arithmetic-based representation for minimizing the multiple constant multiplication and conventional ripple carry adders are replaced with [Formula: see text] compressors. This proposed architecture is implemented by using Radix-[Formula: see text]-based multiplier and [Formula: see text] compressor architectures for achieving better improvement in the critical path delay. The Radix-[Formula: see text]-based arithmetic bit recording is used in order to reduce the design complexity of the multiplication. The proposed architecture significantly reduced the delay when compared to existing and conventional architectures.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 630
Author(s):  
Padmanabhan Balasubramanian ◽  
Raunaq Nayar ◽  
Douglas L. Maskell

This article describes the design of approximate array multipliers by making vertical or horizontal cuts in an accurate array multiplier followed by different input and output assignments within the multiplier. We consider a digital image denoising application and show how different combinations of input and output assignments in an approximate array multiplier affect the quality of the denoised images. We consider the accurate array multiplier and several approximate array multipliers for synthesis. The multipliers were described in Verilog hardware description language and synthesized by Synopsys Design Compiler using a 32/28-nm complementary metal-oxide-semiconductor technology. The results show that compared to the accurate array multiplier, one of the proposed approximate array multipliers viz. PAAM01-V7 achieves a 28% reduction in critical path delay, 75.8% reduction in power, and 64.6% reduction in area while enabling the production of a denoised image that is comparable in quality to the image denoised using the accurate array multiplier. The standard design metrics such as critical path delay, total power dissipation, and area of the accurate and approximate multipliers are given, the error parameters of the approximate array multipliers are provided, and the original image, the noisy image, and the denoised images are also depicted for comparison.


Author(s):  
Kenta Shirane ◽  
Takahiro Yamamoto ◽  
Hiroyuki Tomiyama

In this paper, we present a case study on approximate multipliers for MNIST Convolutional Neural Network (CNN). We apply approximate multipliers with different bit-width to the convolution layer in MNIST CNN, evaluate the accuracy of MNIST classification, and analyze the trade-off between approximate multiplier’s area, critical path delay and the accuracy. Based on the results of the evaluation and analysis, we propose a design methodology for approximate multipliers. The approximate multipliers consist of some partial products, which are carefully selected according to the CNN input. With this methodology, we further reduce the area and the delay of the multipliers with keeping high accuracy of the MNIST classification.


Circuit World ◽  
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Anil Kumar Uppugunduru ◽  
Syed Ershad Ahmed

Purpose Multipliers that form the basic building blocks in most of the error-resilient media processing applications are computationally intensive and power-hungry modules. Therefore, improving the multiplier’s performance in terms of area, critical path delay and power has become an important research area. This paper aims to propose two improved multiplier designs based on a new approximate compressor circuit to reduce the hardware complexity at the partial product reduction stage. The proposed approximate 4:2 compressor design significantly reduces the overall hardware cost of the multiplier. The error introduced by the approximate compressor is reduced using a new technique of assigning inputs to the compressors in the partial product reduction structure. Design/methodology/approach The multiplier designs implemented using the proposed approximate 4:2 compressor are targeted for error-resilient applications. For fair comparisons, various multiplier designs, including the proposed one, are implemented in MATLAB. The quality analysis is carried out using standard images, and metrics such as structural similarity index are computed to quantify the result of proposed designs with the existing architectures. Next, Verilog gate-level designs are synthesized to compute area, delay and power to prove the efficacy of the proposed designs. Findings Exhaustive error and hardware analysis have been carried out for the existing and proposed multiplier architectures. Error analysis carried out using MATLAB proves that the proposed designs achieve better quality metrics than existing designs. Hardware results show that area, the power consumed and critical path delay are reduced up to 39.8%, 51.7% and 15.9%, respectively, compared to the existing designs. Toward the end, the proposed designs impact is quantified and compared with existing designs on real-time image sharpening and image multiplication applications. Originality/value The area, delay and power metrics of the multiplier can be improved using an approximate compressor in an error-resilient application. Accordingly, in this work, a new compressor is proposed that reduces the hardware complexity in the multiplier architecture. However, the proposed approximate compressor, while reducing the computational complexity, tends to introduce error in the multiplier. The error introduced by the approximate compressor is reduced using a new technique of assigning inputs to the compressors in the partial product reduction structure. With the help of the approximate compressor and a technique of input realignment, hardware efficient and highly accurate multiplier designs are achieved.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Mohd Tausif ◽  
Ekram Khan ◽  
Mohd Hasan ◽  
Martin Reisslein

This paper proposes and evaluates the LFrWF, a novel lifting-based architecture to compute the discrete wavelet transform (DWT) of images using the fractional wavelet filter (FrWF). In order to reduce the memory requirement of the proposed architecture, only one image line is read into a buffer at a time. Aside from an LFrWF version with multipliers, i.e., the LFr WF m , we develop a multiplier-less LFrWF version, i.e., the LFr WF ml , which reduces the critical path delay (CPD) to the delay T a of an adder. The proposed LFr WF m and LFr WF ml architectures are compared in terms of the required adders, multipliers, memory, and critical path delay with state-of-the-art DWT architectures. Moreover, the proposed LFr WF m and LFr WF ml architectures, along with the state-of-the-art FrWF architectures (with multipliers (Fr WF m ) and without multipliers (Fr WF ml )) are compared through implementation on the same FPGA board. The LFr WF m requires 22% less look-up tables (LUT), 34% less flip-flops (FF), and 50% less compute cycles (CC) and consumes 65% less energy than the Fr WF m . Also, the proposed LFr WF ml architecture requires 50% less CC and consumes 43% less energy than the Fr WF ml . Thus, the proposed LFr WF m and LFr WF ml architectures appear suitable for computing the DWT of images on wearable sensors.


FinFet transistors are used in major semiconductor organizations which play a significant role in the development of the silicon industries. Due to few embedded memories and other circuit issues the transistors have specific faults in manufacturing, designing of the circuit etc. This paper presents an advanced test algorithm to diagnose those faults. The circuit with different gates is designed to identify the places having faults. In addition, algorithms such as non-incremental algorithms is used to find critical path, path delay and PDF of Critical path delay and Genetic Algorithm for optimisation of Critical path delay for sensitive test vector and no of iterations. The transfer characteristics curve is plotted along with the delay curve which helps in finding out the simulation parameters such as noise margin, propagation delay. The results in the methodology calculate the probability density function of the critical path by estimating mean, standard deviation and variance. The advantages of the integration of the two algorithms in this paper help in analyzing the specific faults in the circuits and the error correction of the broken link in the path analysis and has enhanced performance. Furthermore, more complicated circuits are analyzed for fault detection with different approach. In this paper the research work on testing, diagnosis, estimation of Critical path and PDF of Critical path delay faults for FinFET based Combinational Circuits for 20nm and 32 nm Technologies are presented for the first time using latest Non Incremental Genetic algorithm.


Author(s):  
P. Sudhanya ◽  
S. P. Joy Vasantha Rani

This paper introduces hybrid iterative algorithms that combine Particle Swarm Optimization (PSO) and Simulated Annealing (SA) algorithms for Field Programmable Gate Array (FPGA) placement by considering adaptive inertia weight and local minima avoidance. The algorithms target to optimize the wire-length of the nets, run time and critical path delay in the placement of logic blocks. Using the adaptive inertia weight parameter and local minima avoidance, the hybrid PSO-SA algorithm is modified to Time-varying PSO-SA (TPSO-SA) and Modified PSO-SA (MPSO-SA) algorithm, respectively. These different hybrid PSO-SA algorithms are checked for efficiency by comparing with the Versatile Place and Route (VPR) algorithm of the Verilog to Routing (VTR) tool using Microelectronics Centre of North Carolina (MCNC) benchmark circuits. The hybrid PSO-SA algorithms give 5–37% better results for wire-length cost and 20–58% reduction in runtime compared to the VPR placement algorithm on different benchmark circuits. Critical path delay is also taken into consideration.


2020 ◽  
Vol 12 ◽  
Author(s):  
Subhashis Maitra

Background: For higher order multiplications, a huge number of adders or compressors are used to perform the addition of the partial products. Objective: Hence the area and the propagation delay will increase. Researchers are trying to reduce the numbers of additions of partial products. Method: In this paper, different modified compressor have been proposed and based on that compressors, 16x16-bit binary multiplier has been discussed. Results: The proposed design provide better area, power consumption, critical path delay and less number of transistor counts when compared to other design using the conventional compressors. Here the proposed method has been used in Wallace tree multiplier or Dadda tree multiplier. The compressor used here has been implemented using Microwind DSCH 3.8 lite. Conclusion: The modified compressor makes the multiplier faster and reduces the number of addition of partial products..


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 820
Author(s):  
Sanghyun Park ◽  
Jaeyung Jun ◽  
Changhyun Kim ◽  
Gyeong Il Min ◽  
Hun Jae Lee ◽  
...  

The fetched instructions would have data dependency with in-flight ones in the pipeline execution of a processor, so the dependency prevents the processor from executing the incoming instructions for guaranteeing the program’s correctness. The register and memory dependencies are detected in the decode and memory stages, respectively. In a small embedded processor that supports as many ISAsas possible to reduce code size, the instruction decoding to identify register usage with the dependence check generally results in long delay and sometimes a critical path in its implementation. For reducing the delay, this paper proposes two methods—One method assumes the widely used source register operand bit-fields without fully decoding the instructions. However, this assumption would cause additional stalls due to the incorrect prediction; thus, it would degrade the performance. To solve this problem, as the other method, we adopt a table-based way to store the dependence history and later use this information for more precisely predicting the dependency. We applied our methods to the commercial EISC embedded processor with the Samsung 65nm process; thus, we reduced the critical path delay and increased its maximum operating frequency by 12.5% and achieved an average 11.4% speed-up in the execution time of the EEMBC applications. We also improved the static, dynamic power consumption, and EDP by 7.2%, 8.5%, and 13.6%, respectively, despite the implementation area overhead of 2.5%.


2020 ◽  
Vol 29 (12) ◽  
pp. 2050186
Author(s):  
Subodh Kumar Singhal ◽  
B. K. Mohanty ◽  
Sujit Kumar Patel ◽  
Gaurav Saxena

Parallel prefix adder (PPA) is the core component of diminished-1 modulo ([Formula: see text]) adder structure. In this paper, group-carry selection logic based PPA design is proposed and it is free from redundant logic operations which otherwise present in the existing PPA design based on group sum selection logic. Further, the logic expression of pre-processing unit of PPA is also presented in a simplified form to save some logic resources. The proposed PPA design for bit-width 32-bit involves 26.1% less area, consumes 28.4% less power and marginally higher critical-path delay than the existing PPA design. An efficient diminished-1 modulo ([Formula: see text]) adder structure is presented using proposed PPA design and modified carry computation algorithm of existing design. The proposed diminished-1 modulo ([Formula: see text]) adder structure for bit-width 32-bit offers a saving of 25.5% in area-delay-product (ADP) and 24.1% in energy-delay-product (EDP) than the best of the existing modulo adder structure.


Sign in / Sign up

Export Citation Format

Share Document