A High-Performance Signed-Unsigned Multiplier Using Vedic Mathematics

A high speed N × N bit multiplier architecture that supports signed and unsigned multiplication operations is proposed in this paper. This architecture incorporates the modified two's complement circuits and also N × N bit unsigned multiplier circuit. This unsigned multiplier circuit is based on decomposing the multiplier circuit into smaller-precision independent multipliers using Vedic Mathematics. These individual multipliers generate the partial products in parallel for high speed operation, which are combined by using high speed adders and parallel adder to generate the product output. The proposed architecture has regular-shape for the partial product tree that makes easy to implement. Finally, this multiplier architecture is implemented in UMC 65 nm technology for N = 8, 16 and 32 bits. The synthesis results shows that the proposed multiplier architecture improves in terms of speed and also reduces power-delay product (PDP), compared to the architectures in the literature.

Download Full-text

Performance Analysis of Various Multipliers Using 8T-full Adder with 180nm Technology

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513666200107091932 ◽

2020 ◽

Vol 13 (6) ◽

pp. 864-870

Author(s):

Sai Venkatramana Prasada G.S ◽

G. Seshikala ◽

S. Niranjana

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Full Adder ◽

Fundamental Operation ◽

Wallace Tree ◽

Power Delay Product ◽

The Comparative Study ◽

Wallace Tree Multiplier

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.

Download Full-text

High-Speed Hybrid-Logic Full Adder Using High-Performance 10-T XOR–XNOR Cell

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1845 ◽

2021 ◽

pp. 263-269

Author(s):

Tejaswini M. L ◽

Aishwarya H ◽

Akhila M ◽

B. G. Manasa

Keyword(s):

High Speed ◽

High Performance ◽

Full Adder ◽

Cmos Technology ◽

Power Performance ◽

High Speed Design ◽

Power Delay Product ◽

Full Swing ◽

Output Swing ◽

High Output

The main aim of our work is to achieve low power, high speed design goals. The proposed hybrid adder is designed to meet the requirements of high output swing and minimum power. Performance of hybrid FA in terms of delay, power, and driving capability is largely dependent on the performance of XOR-XNOR circuit. In hybrid FAs maximum power is consumed by XOR-XNOR circuit. In this paper 10T XOR-XNOR is proposed, which provide good driving capabilities and full swing output simultaneously without using any external inverter. The performance of the proposed circuit is measured by simulating it in cadence virtuoso environment using 90-nm CMOS technology. This circuit outperforms its counterparts showing power delay product is reduced than that of available XOR-XNOR modules. Four different full adder designs are proposed utilizing 10T XOR-XNOR, sum and carry modules. The proposed FAs provide improvement in terms of PDP than that of other architectures. To evaluate the performance of proposed full adder circuit, we embedded it in a 4-bit and 8-bit cascaded full adder. Among all FAs two of the proposed FAs provide the best performance for a higher number of bits.

Download Full-text

Low Power, High Speed and Area Efficient Binary Count Multiplier

Journal of Circuits System and Computers ◽

10.1142/s0218126616500274 ◽

2016 ◽

Vol 25 (04) ◽

pp. 1650027 ◽

Cited By ~ 1

Author(s):

Kore Sagar Dattatraya ◽

Belgudri Ritesh Appasaheb ◽

Ramdas Bhanudas Khaladkar ◽

V. S. Kanchana Bhaaskaran

Keyword(s):

Digital Signal Processor ◽

Word Length ◽

High Speed ◽

Digital Signal ◽

General Purpose ◽

Computation Method ◽

Partial Product ◽

Wallace Tree ◽

Power Delay Product ◽

Binary Count

Multiplier forms the core building block of any processor, such as the digital signal processor (DSP) and a general purpose microprocessor. As the word length increases, the number of adders or compressors required for the partial product addition also increases. The addition operation of the derived partial products determines the circuit latency, area and speed performance of wider word-length multipliers. Binary count multiplier (BCM) aims to reduce the number of adders and compressors through the use of a uniquely structured binary counter and by suitably altering the logical flow of partial product addition by using binary adders is proposed in this paper. The binary counters for varying bit count values are derived by modifying the basic 4:2 compressor circuit. A [Formula: see text] bit multiplier has been developed to validate the proposed computation method. This logic structure demonstrates lower power operation, reduced device count and lesser delay in comparison against the conventional Wallace tree multiplier structure found in the literature. The BCM implementation realizes 29.17% reduction in the device count, 66% reduction in the delay and 70% reduction in the power dissipation. Furthermore, it realizes 90% reduction in the power delay product (PDP) in comparison against the Wallace tree structure. The multiplier circuits have been implemented and the validation of results has been carried out using Cadence[Formula: see text] EDA tool. Forty five nanometer technology files have been employed for the designs and exhaustive SPICE simulations.

Download Full-text

Efficient Hardware Implementations of Binary-to-BCD Conversion Schemes for Decimal Multiplication

Journal of Circuits System and Computers ◽

10.1142/s021812661550019x ◽

2014 ◽

Vol 24 (02) ◽

pp. 1550019

Author(s):

Osama Al-Khaleel ◽

Zakaria Al-Qudah ◽

Mohammad Al-Khaleel ◽

Raed Bani-Hani ◽

Christos Papachristou ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Partial Product ◽

Hardware Implementations ◽

Array Multipliers ◽

Decimal Multiplication ◽

Multiplier Circuit

This paper proposes two high performance binary-to-binary coded decimal (BCD) conversion algorithms for use in BCD multiplication. These algorithms are based on splitting the 7-bit binary partial product of two BCD digits into two groups, computing the contribution of each group to the equivalent BCD partial product, and adding these contributions to compute the final BCD partial product. Designs for the proposed architectures and their implementations targeting both ASIC and FPGA are compared with others. Implementations of BCD array multipliers using both our conversion circuits and existing conversion circuits have been performed. The synthesis results for both ASIC and FPGA show that the proposed designs are faster and occupying less area than the state-of-the-art conversion circuits. Furthermore, the results obtained from comparing BCD multipliers of various sizes show that the enhancement in the area of the conversion circuit grows into a sizable area improvement in the multiplier circuit.

Download Full-text

Ultra high speed full adder for biomedical applications

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i1.pp25-31 ◽

2021 ◽

Vol 10 (1) ◽

pp. 25

Author(s):

Basavoju Harish ◽

M. S. S. Rukmini

Keyword(s):

Power Consumption ◽

High Speed ◽

High Performance ◽

Supply Voltage ◽

Medical Engineering ◽

Digital Signal ◽

Full Adder ◽

Digital Devices ◽

Ripple Carry Adder ◽

Power Delay Product

In the field of bio medical engineering high performance CPU for digital signal processing plays a significant role. Frequency efficient circuit is a paramount requirement for the portable digital devices employing various digital processors. In this work a novel high speed one-bit 10T full adder with complemented output was described. The circuit was constructed with XOR gates which were built using two CMOS transistors. The XOR gate was constructed using 2T multiplexer circuit style. It was observed that power consumption of the designed circuit at 180nm with supply voltage 1.8V is 183.6 uW and delay was 1.809 ps whereas power consumption at 90nm with supply voltage 1.2V is 25.74 uW and delay was 8.245 ps. The observed Power Delay Product (PDP) in 180nm (at supply voltage 1.8V) is 0.33 and in 90nm (at supply voltage 1.2V) is 0.212. The work was extended by implementing a 32-bit Ripple Carry Adder (RCA) and was found that the delay at 180nm is 93.7ps and at 90nm is 198ps. The results were drawn at 180nm and also 90nm technology using CAD tool. The results say that the present work offered significant enhancement in speed and PDP compared with existing designs.

Download Full-text

Assertion Driven Modified Booth Encoding and Post Computation Model for Speed MAC Applications

10.3233/apc210289 ◽

2021 ◽

Author(s):

S. Sivasaravanababu ◽

T.R. Dineshkumar ◽

G. Saravana Kumar

Keyword(s):

Computational Complexity ◽

High Speed ◽

High Performance ◽

Complexity Reduction ◽

Partial Product ◽

Path Delay ◽

Overall Design ◽

Look Ahead ◽

Wireless Application ◽

Product Accumulation

The Multiply-Accumulate Unit (MAC) is the core computational block in many DSP and wireless application but comes with more complicated architectures. Moreover the MAC block also decides the energy consumption and the performance of the overall design; due to its lies in the maximal path delay critical propagation. Developing high performance and energy optimized MAC core is essential to optimized DSP core. In this work, a high speed and low power signed booth radix enabled MAC Unit is proposed with highly configurable assertion driven modified booth algorithm (AD-MBE). The proposed booth core is based on core optimized booth radix-4 with hierarchical partial product accumulation design and associated path delay optimization and computational complexity reduction. Here all booth generated partial products are added as post summation adder network which consists of carry select adder (CSA) & carry look ahead (CLA) sequentially which narrow down the energy and computational complexity. Here increasing the operating frequency is achieved by accumulating encoding bits of each of the input operand into assertion unit before generating end results instead of going through the entire partial product accumulation. The FPGA implementation of the proposed signed asserted booth radix-4 based MAC shows significant complexity reduction with improved system performance as compared to the conventional booth unit and conventional array multiplier.

Download Full-text

VLSI Implementation of High Speed Energy-Efficient Truncated Multiplier

Journal of Circuits System and Computers ◽

10.1142/s0218126618500779 ◽

2018 ◽

Vol 27 (05) ◽

pp. 1850077 ◽

Cited By ~ 2

Author(s):

K. N. Vijeyakumar ◽

S. Elango ◽

S. Kalaiselvi

Keyword(s):

Power Dissipation ◽

Energy Efficient ◽

High Speed ◽

Total Error ◽

Absolute Error ◽

Structural Level ◽

Chip Area ◽

Vedic Mathematics ◽

Power Delay Product ◽

Compiler Performance

In this brief, we present the design and evaluation of a high speed and energy-efficient truncated multiplier for unsigned multiplication, such that the average absolute error due to truncation and rounding is kept minimal. The proposed algorithm eliminates a few least significant Partial Product (PP) bits and adds correction bias at appropriate PP bit positions to minimize the total error. From the literatures reviewed, it is clear that there is scope for reducing delay in multiplication using sutras of ancient vedic mathematics. This work uses a simple “crosswise and vertical sutra” of Vedic mathematics to generate PP bits. The proposed methodology groups the input into [Formula: see text]/2 bits, eliminates least subgroup multiplication ([Formula: see text]) and deletes few least significant bits in other subgroup multiplications to reduce area and power dissipation. In addition, correction biase are added at appropriate bit positions to reduce the overall absolute error due to the elimination of few PP bits and rounding of final product. Experimental evaluation of the proposed truncated design is carried out through structural level VHDL modeling and simulations using Synopsys design compiler. Performance analysis revealed Chip-Area Ratio (CAR%) to be 33.81% and Power-Delay Product (PDP) of 14.84[Formula: see text]pJ of proposed truncated design for an [Formula: see text] multiplication.

Download Full-text

Design and Implementation of 8x8 Multiplier using 4-2 Compressor and 5-2 Compressor

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v5.i3.pp131-135 ◽

2016 ◽

Vol 5 (3) ◽

pp. 131 ◽

Cited By ~ 1

Author(s):

K. Hari Kishore ◽

K. Akhil ◽

G. Viswanath ◽

N. Pavan Kumar

Keyword(s):

Low Power ◽

High Speed ◽

Electronic Device ◽

Logic Gates ◽

Partial Product ◽

Transmission Gate ◽

Design And Implementation ◽

Wallace Tree ◽

Minimum Delay ◽

Power Delay Product

In this paper, a 8x8 multiplier is realized by using 4-2 and 5-2 compressors. Low-power high speed 4-2 compressors and 5-2 compressors are extensively utilized for numerical realizations. Both the compressors circuits that is the 4-2 compressor circuit and 5-2 compressor circuit internally consist of the logic gates i.e. the XOR and XNOR gates. 4-2 compressor circuit has been designed uses a brand new partial-product reduction format that consecutively reduces the utmost output new style of number needs less variety of MOSFET’s compared to Wallace Tree Multipliers. The 4-2 compressor used is created from high-speed and consists of logic gates XOR and XNOR gates and transmission gate primarily based electronic device. The regular delay and switching energy also called as power-delay product (PDP) is differentiated with the 5-2 compressor enforced with 4-2 Compressors and while not compressors, and is evidenced to own minimum delay and PDP. Simulations are performed by mistreatment Xilinx ten.1 ISE.

Download Full-text

High-Performance Low-Power 5:2 Compressor With 30 CNTFETs Using 32 nm Technology

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327909666190206144601 ◽

2019 ◽

Vol 9 (4) ◽

pp. 462-467

Author(s):

Jitendra Kumar Saini ◽

Avireni Srinivasulu ◽

Renu Kumawat

Keyword(s):

Power Consumption ◽

High Speed ◽

High Performance ◽

Cmos Technology ◽

Vital Role ◽

Arithmetic Circuit ◽

Voltage Supply ◽

Big Data Applications ◽

High Performance Systems ◽

Power Delay Product

Background: The advent of High Performance Computing (HPC) applications and big data applications has made it imparitive to develop hardware that can match the computing demands. In such high performance systems, the high speed multipliers are the most sought after components. A compressor is an important part of the multiplier; it plays a vital role in the performance of multiplier, also it contributes to the efficiency enhancement of an arithmetic circuit. The 5:2 compressor circuit design proposed here improves overall performance and efficiency of the arithmetic circuits in terms of power consumption, delay and power delay product. The proposed 5:2 compressor circuit was implemented using both CMOS and Carbon Nano Tube Field Effect Transistor (CNTFET) technologies and it was observed that the proposed circuit has yielded better results with CNTFETs as compared to MOSFETs. Methods/Results: The proposed 5:2 compressor circuit was designed with CMOS technology simulated at 45 nm with voltage supply 1.0 V and compared it with the existing 5:2 compressor designes to validate the improvements. Thereafter, the proposed design was implemented with CNTFET technology at 32 nm and simulated with voltage supply 0.6 V. The comparision results of proposed 5:2 compressor with existing designs implemented using CMOS. The results also compare the proposed design on CMOS and CNTFET technologies for parameters like power, delay, power delay product. Conclusion: It can be concluded that the proposed 5:2 compressor gives better results as compared to the existing 5:2 compressor designs implemeted using CMOS. The improvement in power, delay and power delay product is approx 30%, 15% and 40% respectively. The proposed circuit of 5:2 compressor is also implemented using CNTFET technology and compared, which further enhances the results by 30% (power consumption and PDP). Hence, the proposed circuit implemented using CNTFET gives substantial improvements over the existing circuits.

Download Full-text

Improved Domino Logic Circuits and its Application in Wide Fan-In OR Gates

Micro and Nanosystems ◽

10.2174/1876402911666190716161631 ◽

2020 ◽

Vol 12 (1) ◽

pp. 58-67

Author(s):

Deepika Bansal ◽

Bal Chand Nagar ◽

Brahamdeo Prasad Singh ◽

Ajay Kumar

Keyword(s):

Power Consumption ◽

High Speed ◽

High Performance ◽

Voltage Drop ◽

Main Concern ◽

Clock Frequency ◽

Domino Logic ◽

Evaluation Phase ◽

Power Delay Product ◽

Domino Circuits

Background: Main concern in efficient VLSI circuit designing is low-power consumption, high-speed and noise tolerance capability. Objective: In this paper, two efficient and high-performance topologies are proposed for cascaded domino logic using carbon nanotube MOSFETs (CN-MOSFETs). The first topology is designed to remove the intermediate charge sharing problem without any keeper circuit, whereas the second one holds the true logic level of the evaluation phase without any voltage drop for next precharge phase. The proposed topologies are suitable for cascading of the high-performance domino circuits. Methods: The proposed domino circuits are tested and verified using Synopsys HSPICE simulator with 32nm CN-MOSFET technology provided by Stanford University. Conclusion: The power delay product of proposed DL-I and DL-II improves by 32.59 % and 40.98 % for 8-input OR gate as compared to standard logic respectively at the clock frequency of 500 MHz. The simulation results validate that the proposed circuits improve the performance of pseudo domino logic with respect to leakage power consumption, delay and unity noise gain.

Download Full-text