scholarly journals LUT Based Generalized Parallel Counters for State - of - art FPGAs

2017 ◽  
Vol 21 (1) ◽  
pp. 3
Author(s):  
Burhan Khurshid

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The  resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.

2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Burhan Khurshid ◽  
Roohie Naaz Mir

Generalized parallel counters (GPCs) are used in constructing high speed compressor trees. Prior work has focused on utilizing the fast carry chain and mapping the logic onto Look-Up Tables (LUTs). This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. In this work, we present a heuristic that efficiently maps the GPC logic onto the LUT fabric. We have used our heuristic on various GPCs and have achieved an improvement in efficiency ranging from 33% to 100% in most of the cases. Experimental results using Xilinx 5th-, 6th-, and 7th-generation FPGAs and Stratix IV and V devices from Altera show a considerable reduction in resources utilization and dynamic power dissipation, for almost the same critical path delay. We have also implemented GPC-based FIR filters on 7th-generation Xilinx FPGAs using our proposed heuristic and compared their performance against conventional implementations. Implementations based on our heuristic show improved performance. Comparisons are also made against filters based on integrated DSP blocks and inherent IP cores from Xilinx. The results show that the proposed heuristic provides performance that is comparable to the structures based on these specialized resources.


2021 ◽  
Vol 23 (11) ◽  
pp. 172-183
Author(s):  
Ketan J. Raut ◽  
◽  
Abhijit V. Chitre ◽  
Minal S. Deshmukh ◽  
Kiran Magar ◽  
...  

Since CMOS technology consumes less power it is a key technology for VLSI circuit design. With technologies reaching the scale of 10 nm, static and dynamic power dissipation in CMOS VLSI circuits are major issues. Dynamic power dissipation is increased due to requirement of high speed and static power dissipation is at much higher side now a days even compared to dynamic power dissipation due to very high gate leakage current and subthreshold leakage. Low power consumption is equally important as speed in many applications since it leads to a reduction in the package cost and extended battery life. This paper surveys contemporary optimization techniques that aims low power dissipation in VLSI circuits.


2018 ◽  
Vol 56 (6) ◽  
pp. 751
Author(s):  
Duc Hung Le

In this paper, hardware design of a Fast Fourier Transform (FFT) core using Single-precision Floating-point Adaptive CORDIC is implemented on Altera Stratix IV FPGA. With FFT implementation, CORDIC is utilized for reducing the speed drawback of complex multiplication and the adaptive algorithm is proposed to decrease the iterations of conventional CORDIC. The experimental results of Adaptive CORDIC and 2048-point Radix-2 Multi-path Delay Commutator FFT designs are built and verified based on three kinds of Look-up Table that cost 16, 8 and 4 constant angles. As experimental results, there is a resource equivalence while it has a trade-off between speed performance and accuracy. In comparison, an adaptive CORDIC core based on Look-up Table of 16 constant angles, and 2048-point Radix-2 Multi-path Delay Commutator Fast Fourier Transform based on Adaptive CORDIC using Look-up Table of 16 constant angles are well responding to resource optimization, high-speed performance and high-accuracy of computations.


Author(s):  
Fadi T. Nasser ◽  
Ivan A. Hashim

In modern very large scale integrated (VLSI) digital systems, power consumption has become a critical concern of VLSI designers. As size shrinks and density increases in chips, it will be a challenge to design high performance and low-power digital systems. Therefore, VLSI designers are trying to reduce power dissipation in these systems by using power optimization techniques. Different mathematical operations can be found in the architectures of most digital systems. The focus of this paper is division. In comparison to other basic computational operations, division requires more iterations, takes a long time, covers a large area, and consumes more power from the digital system. As a result, the system's design requires high speed and a low-power divider in order to improve its overall performance. This paper focuses on dynamic power dissipation. In order to determine which design consumes the lowest dynamic power, different system designs of digit-recurrence division algorithms, such as restoring division and non-restoring division are suggested. An innovative power-optimization technique, the very hardware descriptions language (VHDL) technique, is utilized to the suggested system designs. The VHDL technique achieved the higher optimization in dynamic power, at 93.66% for non-restoring division with internal-loop iteration, than traditional approaches.


2018 ◽  
Vol 27 (13) ◽  
pp. 1850200 ◽  
Author(s):  
Abdoul Rjoub ◽  
Ehab M. Ghabashneh

The demand for high performance, low power/secured handheld equipment increased the need for high speed/low energy and efficient encryption/decryption algorithms. Recently, efficient techniques were suggested to increase the standard of security as well as the speed of portable and handheld devices. Also, those techniques cause increment in the lifetime of battery by reducing the total silicon capacitance and minimizing the switching activity. This paper presents two approaches to reduce the number of logic gates at S7 and S9 of MISTY1 in order to reduce the total delay time, power dissipation and silicon area. The Logic Gate Reduction Approach (LGRA) reduces the number of logic gates by applying Boolean Algebra rules and simplifications, while the Duplicated Gate Reduction Approach (DGRA) removes the redundant XOR and AND logic gates which form the S7 and S9 blocks ciphers. The LGRA approach shows that the throughput enhanced by 21.1% compared to the conventional design, the silicon area reduced by 26.8%, while the dynamic power dissipation is reduced by 21.7% on average. The DGRA approach shows that the throughput enhanced by 3.8% compared to the conventional design, the silicon area reduced by 31.7%, while the dynamic power dissipation is reduced by 27% on average. As a result, the proposed approaches could be fit for next generation of handheld and portable devices.


1999 ◽  
Author(s):  
Shahriar Jahanian ◽  
Z. J. Delalic

Abstract High speed computation is driving VLSI custom chips into smaller micron sizes and scale down power supplies. To accomplish very high speed, industry is developing shut down methods and short channel devices. Going below 0.5 micron technology speed is accomplished but hot spots, power density, and die failure are increased. Failure accumulated knowledge has not yet established a classic theory. In this paper, the STEPS method is used to determine the power dissipation in a CMOS circuit The experiment demonstrates dynamic power dissipation and assumes that static power dissipation is negligible in the CMOS devices. Each node is examined individually as signals are propagated through the chip. At each node the power distribution in the form of heat is determined.


VLSI Design ◽  
2008 ◽  
Vol 2008 ◽  
pp. 1-7 ◽  
Author(s):  
Pedro Echeverría ◽  
José L. Ayala ◽  
Marisa López-Vallejo

The content-based access of CAMs makes them of great interest in lookup-based operations. However, the large amounts of parallel comparisons required cause an expensive cost in power dissipation. In this work, we present a novel banked precomputation-based architecture for low-power and storage-demanding applications where the reduction of both dynamic and leakage power consumption is addressed. Experimental results show that the proposed banked architecture reduces up to an 89% of dynamic power consumption during the search process while the leakage power consumption is also minimized up to a 91%.


In many DSP applications, generally multipliers and adders are two key components which are highly complex and consume more power. Out of that the design of adder circuitry is quite complex compared to multiplier which consumes more power. Hence optimization of power consumption of adder circuits is a challenging task in the recent year and is a need of today’s world. In order to give a justice to this problem, work presented in this paper describes the technique of designing floating point adder and subtractor using low power pipelining technique which leads to a reduction in power consumption by a significant amount. Moreover, the presented work in the paper deals with the design of low power transistorized architecture for 32-bit floating point adder/ subtractor without and with pipelining approach in 50nm CMOS VLSI technology. The experimental results demonstrated that, the dynamic power consumption of the floating point adder/subtractor architectures is reduced significantly by employing pipelining technique as compared to the without pipelining technique. Also, in this work a significant improvement has been achieved in the critical path for pipelined approach compared to without pipeline approach. The proposed design is a full custom design prepared and analyzed using cadence 6.15 tool


Author(s):  
Sai Venkatramana Prasada G.S ◽  
G. Seshikala ◽  
S. Niranjana

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.


Sign in / Sign up

Export Citation Format

Share Document