scholarly journals High Efficiency Generalized Parallel Counters for Look-Up Table Based FPGAs

2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Burhan Khurshid ◽  
Roohie Naaz Mir

Generalized parallel counters (GPCs) are used in constructing high speed compressor trees. Prior work has focused on utilizing the fast carry chain and mapping the logic onto Look-Up Tables (LUTs). This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. In this work, we present a heuristic that efficiently maps the GPC logic onto the LUT fabric. We have used our heuristic on various GPCs and have achieved an improvement in efficiency ranging from 33% to 100% in most of the cases. Experimental results using Xilinx 5th-, 6th-, and 7th-generation FPGAs and Stratix IV and V devices from Altera show a considerable reduction in resources utilization and dynamic power dissipation, for almost the same critical path delay. We have also implemented GPC-based FIR filters on 7th-generation Xilinx FPGAs using our proposed heuristic and compared their performance against conventional implementations. Implementations based on our heuristic show improved performance. Comparisons are also made against filters based on integrated DSP blocks and inherent IP cores from Xilinx. The results show that the proposed heuristic provides performance that is comparable to the structures based on these specialized resources.

2017 ◽  
Vol 21 (1) ◽  
pp. 3
Author(s):  
Burhan Khurshid

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The  resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.


2012 ◽  
Vol 9 (3) ◽  
pp. 325-342 ◽  
Author(s):  
Negovan Stamenkovic ◽  
Vladica Stojanovic

In this paper, the design of a Finite Impulse Response (FIR) filter based on the residue number system (RNS) is presented. We chose to implement it in the (RNS), because the RNS offers high speed and low power dissipation. This architecture is based on the single RNS multiplier-accumulator (MAC) unit. The three moduli set {2n+1,2n,2n-1}, which avoids 2n+1 modulus, is used to design FIR filter. A numerical example illustrates the principles of residue encoding, residue arithmetic, and residue decoding for FIR filters.


VLSI technology become one of the most significant and demandable because of the characteristics like device portability, device size, large amount of features, expenditure, consistency, rapidity and many others. Multipliers and Adders place an important role in various digital systems such as computers, process controllers and signal processors in order to achieve high speed and low power. Two input XOR/XNOR gate and 2:1 multiplexer modules are used to design the Hybrid Full adders. The XOR/XNOR gate is the key punter of power included in the Full adder cell. However this circuit increases the delay, area and critical path delay. Hence, the optimum design of the XOR/XNOR is required to reduce the power consumption of the Full adder Cell. So a 6 New Hybrid Full adder circuits are proposed based on the Novel Full-Swing XOR/XNOR gates and a New Gate Diffusion Input (GDI) design of Full adder with high-swing outputs. The speed, power consumption, power delay product and driving capability are the merits of the each proposed circuits. This circuit simulation was carried used cadence virtuoso EDA tool. The simulation results based on the 90nm CMOS process technology model.


2019 ◽  
Vol 28 (09) ◽  
pp. 1950149
Author(s):  
Bahram Rashidi ◽  
Mohammad Abedini

This paper presents efficient lightweight hardware implementations of the complete point multiplication on binary Edwards curves (BECs). The implementations are based on general and special cases of binary Edwards curves. The complete differential addition formulas have the cost of [Formula: see text] and [Formula: see text] for general and special cases of BECs, respectively, where [Formula: see text] and [Formula: see text] denote the costs of a field multiplication, a field squaring and a field multiplication by a constant, respectively. In the general case of BECs, the structure is implemented based on 3 concurrent multipliers. Also in the special case of BECs, two structures by employing 3 and 2 field multipliers are proposed for achieving the highest degree of parallelization and utilization of resources, respectively. The field multipliers are implemented based on the proposed efficient digit–digit polynomial basis multiplier. Two input operands of the multiplier proceed in digit level. This property leads to reduce hardware consumption and critical path delay. Also, in the structure, based on the change of input digit size from low digit size to high digit size the number of clock cycles and input words are different. Therefore, the multiplier can be flexible for different cryptographic considerations such as low-area and high-speed implementations. The point multiplication computation requires field inversion, therefore, we use a low-cost Extended Euclidean Algorithm (EEA) based inversion for implementation of this field operation. Implementation results of the proposed architectures based on Virtex-5 XC5VLX110 FPGA for two fields [Formula: see text] and [Formula: see text] are achieved. The results show improvements in terms of area and efficiency for the proposed structures compared to previous works.


2013 ◽  
Vol 706-708 ◽  
pp. 882-887
Author(s):  
Ji Zhu Liu ◽  
Yang Jun Wang ◽  
Tao Chen ◽  
Ming Qiang Pan ◽  
Li Guo Chen ◽  
...  

Iron loss will be rapidly increased when the permanent magnet iron core synchronous motor runs at a high speed, which makes the motor produce so much heat that causes low efficiency of the motor and even burns out the motor. The iron-core-free permanent magnet synchronous motor remedies this defect and has a high efficiency at high speed. This article makes a comparative analysis on the iron-core-free permanent magnet synchronous motor torque density with different slot engagement classifications. The paper puts forward an optimized model of permanent magnet synchronous motor without the iron core. The technology of the permanent magnet synchronous motor without iron core is studied based on this model which provides a method to design and manufacture the iron-core-free permanent magnet synchronous motor.


VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-7
Author(s):  
Zhen-dong Zhang ◽  
Bin Wu ◽  
Yu-mei Zhou ◽  
Xin Zhang

A high-speed low-complexity hardware interleaver/deinterleaver is presented. It supports all 77 802.11n high-throughput (HT) modulation and coding schemes (MCSs) with short and long guard intervals and the 8 non-HT MCSs defined in 802.11a/g. The paper proposes a design methodology that distributes the three permutations of an interleaver to both write address and read address. The methodology not only reduces the critical path delay but also facilitates the address generation. In addition, the complex mathematical formulas are replaced with optimized hardware structures in which hardware intensive dividers and multipliers are avoided. Using 0.13 um CMOS technology, the cell area of the proposed interleaver/deinterleaver is 0.07 mm2, and the synthesized maximal working frequency is 400 MHz. Comparison results show that it outperforms the three other similar works with respect to hardware complexity and max frequency while maintaining high flexibility.


2015 ◽  
Vol 743 ◽  
pp. 168-171 ◽  
Author(s):  
Xiao Lei Wang ◽  
Tai Yuan Yin ◽  
Jin Tao Chen ◽  
Jian Xun Liang ◽  
Yang Li

DC motor speed control system is a typical closed-loop control system ofelectromechanical control subject. This paper presents a fast and efficient developing method ofcontrol system based on MATLAB, overcoming the shortcomings of the low efficiency and longdesign cycle in the traditional control system, and completing the rapid design of DC motor speedcontrol system, with its whole process based on MATLAB through the combination and applicationof the multiple toolboxes of the MATLAB. It applies the System Identification toolbox ofMATLAB to model the DC motor, the Simulink toolbox to simulate the control system, SimulinkDesign Optimization toolbox to optimize the PID parameters automatically, and the RTWtechnology to generate the codes for the DSP target board. Compared with the traditional designmethod, this method is characterized by high-efficiency, high-speed, and easy adjustment, havingcertain significance to the design of other control systems.


2013 ◽  
Vol 321-324 ◽  
pp. 361-366
Author(s):  
Yan Yu Ding ◽  
De Ming Wang ◽  
Qing Qing Huang ◽  
Hong Zhou Tan

A high performance full adder circuit with full voltage-swing based on a novel 7-transistor xor-xnor cell is proposed in this paper. In our design, we exploit a novel 7-transistor xor-xnor circuit with a signal level restorer in a feedback path to settle the threshold voltage loss problem. Then we present a new high-performance 1-bit full adder based on the designed xor-xnor cell, pass-transistors and transmission gates. The simulation results prove that, compared with other designs in literature, the proposed full adder shows its superiority for less power dissipation, lower critical path delay and smaller power-delay product, and still provides full voltage swing in all nodes of the circuit.


Author(s):  
POOJA GUPTA ◽  
Saroj Kumar Lenka

This paper describes an efficient implementation for a multi-level convolution based 1-D DWT hardware architecture for use in FPGAs. The proposed architecture combines some hardware optimization techniques to develop a novel DWT architecture that has high performance and is suitable for portable and high speed devices. The first step towards the hardware implementation of the DWT algorithm was to choose the type of FIR filter block. Firstly we design the high speed linear phase FIR filter using pipelined and parallel arithmetic methods. This proposed filter employs efficiently distributed D-latches and multipliers. Furthermore this filter is used in the proposed DWT architecture. Thus, the new VLSI architecture based on combining of fast FIR filters for reducing the critical path delay and data interleaving technique for lower chip area. We synthesized the final design using Xilinx 9.1i ISE tool. We illustrate that a DWT design using a pipelined linear phase FIR filter coupled with data-interleaving gives the best combination of the performance metrics when compared to other DWT structures.


Sign in / Sign up

Export Citation Format

Share Document