LUT Based Generalized Parallel Counters for  State - of - art FPGAs

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.

Download Full-text

High Efficiency Generalized Parallel Counters for Look-Up Table Based FPGAs

International Journal of Reconfigurable Computing ◽

10.1155/2015/518272 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16 ◽

Cited By ~ 4

Author(s):

Burhan Khurshid ◽

Roohie Naaz Mir

Keyword(s):

Power Dissipation ◽

High Speed ◽

High Efficiency ◽

Critical Path ◽

Fir Filters ◽

Path Delay ◽

Look Up Table ◽

Improved Performance ◽

Ip Cores ◽

Low Efficiency

Generalized parallel counters (GPCs) are used in constructing high speed compressor trees. Prior work has focused on utilizing the fast carry chain and mapping the logic onto Look-Up Tables (LUTs). This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. In this work, we present a heuristic that efficiently maps the GPC logic onto the LUT fabric. We have used our heuristic on various GPCs and have achieved an improvement in efficiency ranging from 33% to 100% in most of the cases. Experimental results using Xilinx 5th-, 6th-, and 7th-generation FPGAs and Stratix IV and V devices from Altera show a considerable reduction in resources utilization and dynamic power dissipation, for almost the same critical path delay. We have also implemented GPC-based FIR filters on 7th-generation Xilinx FPGAs using our proposed heuristic and compared their performance against conventional implementations. Implementations based on our heuristic show improved performance. Comparisons are also made against filters based on integrated DSP blocks and inherent IP cores from Xilinx. The results show that the proposed heuristic provides performance that is comparable to the structures based on these specialized resources.

Download Full-text

Low Power VLSI Design Techniques: A Review

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/11881 ◽

2021 ◽

Vol 23 (11) ◽

pp. 172-183

Author(s):

Ketan J. Raut ◽

◽

Abhijit V. Chitre ◽

Minal S. Deshmukh ◽

Kiran Magar ◽

...

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

Vlsi Design ◽

Cmos Technology ◽

Vlsi Circuits ◽

Optimization Techniques ◽

Battery Life ◽

Dynamic Power ◽

Cmos Vlsi

Since CMOS technology consumes less power it is a key technology for VLSI circuit design. With technologies reaching the scale of 10 nm, static and dynamic power dissipation in CMOS VLSI circuits are major issues. Dynamic power dissipation is increased due to requirement of high speed and static power dissipation is at much higher side now a days even compared to dynamic power dissipation due to very high gate leakage current and subthreshold leakage. Low power consumption is equally important as speed in many applications since it leads to a reduction in the package cost and extended battery life. This paper surveys contemporary optimization techniques that aims low power dissipation in VLSI circuits.

Download Full-text

DESIGN OF A HIGH-SPEED HIGH-ACCURACY 2048-POINT FFT USING SINGLE-PRECISION FLOATING-POINT ADAPTIVE CORDIC ON FPGA

Vietnam Journal of Science and Technology ◽

10.15625/2525-2518/56/6/12269 ◽

2018 ◽

Vol 56 (6) ◽

pp. 751

Author(s):

Duc Hung Le

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

High Speed ◽

High Accuracy ◽

Experimental Results ◽

Floating Point ◽

Path Delay ◽

Single Precision ◽

Look Up Table ◽

Speed Performance

In this paper, hardware design of a Fast Fourier Transform (FFT) core using Single-precision Floating-point Adaptive CORDIC is implemented on Altera Stratix IV FPGA. With FFT implementation, CORDIC is utilized for reducing the speed drawback of complex multiplication and the adaptive algorithm is proposed to decrease the iterations of conventional CORDIC. The experimental results of Adaptive CORDIC and 2048-point Radix-2 Multi-path Delay Commutator FFT designs are built and verified based on three kinds of Look-up Table that cost 16, 8 and 4 constant angles. As experimental results, there is a resource equivalence while it has a trade-off between speed performance and accuracy. In comparison, an adaptive CORDIC core based on Look-up Table of 16 constant angles, and 2048-point Radix-2 Multi-path Delay Commutator Fast Fourier Transform based on Adaptive CORDIC using Look-up Table of 16 constant angles are well responding to resource optimization, high-speed performance and high-accuracy of computations.

Download Full-text

Power optimization of binary division based on FPGA

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i3.pp1354-1366 ◽

2021 ◽

Vol 24 (3) ◽

pp. 1354

Author(s):

Fadi T. Nasser ◽

Ivan A. Hashim

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

Large Scale ◽

Power Optimization ◽

Optimization Technique ◽

Optimization Techniques ◽

Digital Systems ◽

Dynamic Power ◽

System Designs

In modern very large scale integrated (VLSI) digital systems, power consumption has become a critical concern of VLSI designers. As size shrinks and density increases in chips, it will be a challenge to design high performance and low-power digital systems. Therefore, VLSI designers are trying to reduce power dissipation in these systems by using power optimization techniques. Different mathematical operations can be found in the architectures of most digital systems. The focus of this paper is division. In comparison to other basic computational operations, division requires more iterations, takes a long time, covers a large area, and consumes more power from the digital system. As a result, the system's design requires high speed and a low-power divider in order to improve its overall performance. This paper focuses on dynamic power dissipation. In order to determine which design consumes the lowest dynamic power, different system designs of digit-recurrence division algorithms, such as restoring division and non-restoring division are suggested. An innovative power-optimization technique, the very hardware descriptions language (VHDL) technique, is utilized to the suggested system designs. The VHDL technique achieved the higher optimization in dynamic power, at 93.66% for non-restoring division with internal-loop iteration, than traditional approaches.

Download Full-text

Low Power High Speed MISTY1 Cryptography Approaches

Journal of Circuits System and Computers ◽

10.1142/s0218126618502006 ◽

2018 ◽

Vol 27 (13) ◽

pp. 1850200 ◽

Cited By ~ 1

Author(s):

Abdoul Rjoub ◽

Ehab M. Ghabashneh

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

Logic Gate ◽

Logic Gates ◽

Portable Devices ◽

Silicon Area ◽

Dynamic Power ◽

Conventional Design ◽

Reduction Approach

The demand for high performance, low power/secured handheld equipment increased the need for high speed/low energy and efficient encryption/decryption algorithms. Recently, efficient techniques were suggested to increase the standard of security as well as the speed of portable and handheld devices. Also, those techniques cause increment in the lifetime of battery by reducing the total silicon capacitance and minimizing the switching activity. This paper presents two approaches to reduce the number of logic gates at S7 and S9 of MISTY1 in order to reduce the total delay time, power dissipation and silicon area. The Logic Gate Reduction Approach (LGRA) reduces the number of logic gates by applying Boolean Algebra rules and simplifications, while the Duplicated Gate Reduction Approach (DGRA) removes the redundant XOR and AND logic gates which form the S7 and S9 blocks ciphers. The LGRA approach shows that the throughput enhanced by 21.1% compared to the conventional design, the silicon area reduced by 26.8%, while the dynamic power dissipation is reduced by 21.7% on average. The DGRA approach shows that the throughput enhanced by 3.8% compared to the conventional design, the silicon area reduced by 31.7%, while the dynamic power dissipation is reduced by 27% on average. As a result, the proposed approaches could be fit for next generation of handheld and portable devices.

Download Full-text

Temperature Distribution in a VLSI Chip due to Dynamic Power Density

10.1115/imece1999-1183 ◽

1999 ◽

Author(s):

Shahriar Jahanian ◽

Z. J. Delalic

Keyword(s):

Power Density ◽

Power Distribution ◽

Power Dissipation ◽

High Speed ◽

Short Channel ◽

Dynamic Power ◽

Vlsi Chip ◽

Classic Theory ◽

Scale Down ◽

Static Power

Abstract High speed computation is driving VLSI custom chips into smaller micron sizes and scale down power supplies. To accomplish very high speed, industry is developing shut down methods and short channel devices. Going below 0.5 micron technology speed is accomplished but hot spots, power density, and die failure are increased. Failure accumulated knowledge has not yet established a classic theory. In this paper, the STEPS method is used to determine the power dissipation in a CMOS circuit The experiment demonstrates dynamic power dissipation and assumes that static power dissipation is negligible in the CMOS devices. Each node is examined individually as signals are propagated through the chip. At each node the power distribution in the form of heat is determined.

Download Full-text

Power Considerations in Banked CAMs: A Leakage Reduction Approach

VLSI Design ◽

10.1155/2008/674259 ◽

2008 ◽

Vol 2008 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Pedro Echeverría ◽

José L. Ayala ◽

Marisa López-Vallejo

Keyword(s):

Power Consumption ◽

Low Power ◽

Power Dissipation ◽

Leakage Power ◽

Experimental Results ◽

Search Process ◽

Dynamic Power ◽

Leakage Reduction ◽

And Storage ◽

Reduction Approach

The content-based access of CAMs makes them of great interest in lookup-based operations. However, the large amounts of parallel comparisons required cause an expensive cost in power dissipation. In this work, we present a novel banked precomputation-based architecture for low-power and storage-demanding applications where the reduction of both dynamic and leakage power consumption is addressed. Experimental results show that the proposed banked architecture reduces up to an 89% of dynamic power consumption during the search process while the leakage power consumption is also minimized up to a 91%.

Download Full-text

Optimization of high-speed CMOS logic circuits with analytical models for signal delay, chip area, and dynamic power dissipation

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ◽

10.1109/43.46799 ◽

1990 ◽

Vol 9 (3) ◽

pp. 236-247 ◽

Cited By ~ 54

Author(s):

B. Hoppe ◽

G. Neuendorf ◽

D. Schmitt-Landsiedel ◽

W. Specks

Keyword(s):

Power Dissipation ◽

High Speed ◽

Logic Circuits ◽

Analytical Models ◽

Signal Delay ◽

Chip Area ◽

Dynamic Power

Download Full-text

Low Power 32-Bit Floating Point Adder/Subtractor Design using 50nm CMOS VLSI Technology

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8788.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 662-674

Keyword(s):

Power Consumption ◽

Low Power ◽

Critical Path ◽

Experimental Results ◽

Floating Point ◽

Dynamic Power ◽

Custom Design ◽

Vlsi Technology ◽

Cmos Vlsi ◽

Dsp Applications

In many DSP applications, generally multipliers and adders are two key components which are highly complex and consume more power. Out of that the design of adder circuitry is quite complex compared to multiplier which consumes more power. Hence optimization of power consumption of adder circuits is a challenging task in the recent year and is a need of today’s world. In order to give a justice to this problem, work presented in this paper describes the technique of designing floating point adder and subtractor using low power pipelining technique which leads to a reduction in power consumption by a significant amount. Moreover, the presented work in the paper deals with the design of low power transistorized architecture for 32-bit floating point adder/ subtractor without and with pipelining approach in 50nm CMOS VLSI technology. The experimental results demonstrated that, the dynamic power consumption of the floating point adder/subtractor architectures is reduced significantly by employing pipelining technique as compared to the without pipelining technique. Also, in this work a significant improvement has been achieved in the critical path for pipelined approach compared to without pipeline approach. The proposed design is a full custom design prepared and analyzed using cadence 6.15 tool

Download Full-text

Performance Analysis of Various Multipliers Using 8T-full Adder with 180nm Technology

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513666200107091932 ◽

2020 ◽

Vol 13 (6) ◽

pp. 864-870

Author(s):

Sai Venkatramana Prasada G.S ◽

G. Seshikala ◽

S. Niranjana

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Full Adder ◽

Fundamental Operation ◽

Wallace Tree ◽

Power Delay Product ◽

The Comparative Study ◽

Wallace Tree Multiplier

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.

Download Full-text