High Efficiency Generalized Parallel Counters for Look-Up Table Based FPGAs

Generalized parallel counters (GPCs) are used in constructing high speed compressor trees. Prior work has focused on utilizing the fast carry chain and mapping the logic onto Look-Up Tables (LUTs). This mapping is not optimal in the sense that the LUT fabric is not fully utilized. This results in low efficiency GPCs. In this work, we present a heuristic that efficiently maps the GPC logic onto the LUT fabric. We have used our heuristic on various GPCs and have achieved an improvement in efficiency ranging from 33% to 100% in most of the cases. Experimental results using Xilinx 5th-, 6th-, and 7th-generation FPGAs and Stratix IV and V devices from Altera show a considerable reduction in resources utilization and dynamic power dissipation, for almost the same critical path delay. We have also implemented GPC-based FIR filters on 7th-generation Xilinx FPGAs using our proposed heuristic and compared their performance against conventional implementations. Implementations based on our heuristic show improved performance. Comparisons are also made against filters based on integrated DSP blocks and inherent IP cores from Xilinx. The results show that the proposed heuristic provides performance that is comparable to the structures based on these specialized resources.

Download Full-text

LUT Based Generalized Parallel Counters for State - of - art FPGAs

Electronics ETF ◽

10.7251/els1721003k ◽

2017 ◽

Vol 21 (1) ◽

pp. 3

Author(s):

Burhan Khurshid

Keyword(s):

Power Dissipation ◽

High Speed ◽

Critical Path ◽

Experimental Results ◽

Considerable Reduction ◽

Prior Work ◽

Dynamic Power ◽

Look Up Table ◽

State Of Art

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.

Download Full-text

Constant-coefficient FIR filters based on residue number system arithmetic

Serbian Journal of Electrical Engineering ◽

10.2298/sjee1203325s ◽

2012 ◽

Vol 9 (3) ◽

pp. 325-342 ◽

Cited By ~ 1

Author(s):

Negovan Stamenkovic ◽

Vladica Stojanovic

Keyword(s):

Power Dissipation ◽

High Speed ◽

Finite Impulse Response ◽

Number System ◽

Residue Number System ◽

Fir Filter ◽

Fir Filters ◽

Residue Number ◽

Low Power Dissipation ◽

Residue Arithmetic

In this paper, the design of a Finite Impulse Response (FIR) filter based on the residue number system (RNS) is presented. We chose to implement it in the (RNS), because the RNS offers high speed and low power dissipation. This architecture is based on the single RNS multiplier-accumulator (MAC) unit. The three moduli set {2n+1,2n,2n-1}, which avoids 2n+1 modulus, is used to design FIR filter. A numerical example illustrates the principles of residue encoding, residue arithmetic, and residue decoding for FIR filters.

Download Full-text

Novel Design of Low-Power High-Speed Hybrid Full Adder Design using Gate Diffusion Input (GDI) Technique

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l7992.1091220 ◽

2020 ◽

Vol 9 (12) ◽

pp. 323-328

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

Critical Path ◽

Circuit Simulation ◽

Full Adder ◽

Cmos Process ◽

Path Delay ◽

Process Technology ◽

Xnor Gate

VLSI technology become one of the most significant and demandable because of the characteristics like device portability, device size, large amount of features, expenditure, consistency, rapidity and many others. Multipliers and Adders place an important role in various digital systems such as computers, process controllers and signal processors in order to achieve high speed and low power. Two input XOR/XNOR gate and 2:1 multiplexer modules are used to design the Hybrid Full adders. The XOR/XNOR gate is the key punter of power included in the Full adder cell. However this circuit increases the delay, area and critical path delay. Hence, the optimum design of the XOR/XNOR is required to reduce the power consumption of the Full adder Cell. So a 6 New Hybrid Full adder circuits are proposed based on the Novel Full-Swing XOR/XNOR gates and a New Gate Diffusion Input (GDI) design of Full adder with high-swing outputs. The speed, power consumption, power delay product and driving capability are the merits of the each proposed circuits. This circuit simulation was carried used cadence virtuoso EDA tool. The simulation results based on the 90nm CMOS process technology model.

Download Full-text

Efficient Lightweight Hardware Structures of Point Multiplication on Binary Edwards Curves for Elliptic Curve Cryptosystems

Journal of Circuits System and Computers ◽

10.1142/s0218126619501494 ◽

2019 ◽

Vol 28 (09) ◽

pp. 1950149

Author(s):

Bahram Rashidi ◽

Mohammad Abedini

Keyword(s):

High Speed ◽

Critical Path ◽

Low Cost ◽

Path Delay ◽

Point Multiplication ◽

Low Area ◽

Elliptic Curve Cryptosystems ◽

Edwards Curves ◽

Special Cases ◽

Field Multiplication

This paper presents efficient lightweight hardware implementations of the complete point multiplication on binary Edwards curves (BECs). The implementations are based on general and special cases of binary Edwards curves. The complete differential addition formulas have the cost of [Formula: see text] and [Formula: see text] for general and special cases of BECs, respectively, where [Formula: see text] and [Formula: see text] denote the costs of a field multiplication, a field squaring and a field multiplication by a constant, respectively. In the general case of BECs, the structure is implemented based on 3 concurrent multipliers. Also in the special case of BECs, two structures by employing 3 and 2 field multipliers are proposed for achieving the highest degree of parallelization and utilization of resources, respectively. The field multipliers are implemented based on the proposed efficient digit–digit polynomial basis multiplier. Two input operands of the multiplier proceed in digit level. This property leads to reduce hardware consumption and critical path delay. Also, in the structure, based on the change of input digit size from low digit size to high digit size the number of clock cycles and input words are different. Therefore, the multiplier can be flexible for different cryptographic considerations such as low-area and high-speed implementations. The point multiplication computation requires field inversion, therefore, we use a low-cost Extended Euclidean Algorithm (EEA) based inversion for implementation of this field operation. Implementation results of the proposed architectures based on Virtex-5 XC5VLX110 FPGA for two fields [Formula: see text] and [Formula: see text] are achieved. The results show improvements in terms of area and efficiency for the proposed structures compared to previous works.

Download Full-text

Modified RS encoder architecture with reduced critical path delay for high speed data communication

2017 International Conference on Intelligent Sustainable Systems (ICISS) ◽

10.1109/iss1.2017.8389244 ◽

2017 ◽

Author(s):

A. Deepa ◽

C. N. Marimuthu

Keyword(s):

High Speed ◽

Critical Path ◽

Data Communication ◽

Path Delay ◽

Critical Path Delay ◽

High Speed Data

Download Full-text

On Study of High Speed Permanent Magnet Synchronous Motor without Iron Core

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.706-708.882 ◽

2013 ◽

Vol 706-708 ◽

pp. 882-887

Author(s):

Ji Zhu Liu ◽

Yang Jun Wang ◽

Tao Chen ◽

Ming Qiang Pan ◽

Li Guo Chen ◽

...

Keyword(s):

Permanent Magnet ◽

High Speed ◽

High Efficiency ◽

Permanent Magnet Synchronous Motor ◽

Synchronous Motor ◽

Iron Core ◽

Torque Density ◽

Optimized Model ◽

Low Efficiency ◽

Design And Manufacture

Iron loss will be rapidly increased when the permanent magnet iron core synchronous motor runs at a high speed, which makes the motor produce so much heat that causes low efficiency of the motor and even burns out the motor. The iron-core-free permanent magnet synchronous motor remedies this defect and has a high efficiency at high speed. This article makes a comparative analysis on the iron-core-free permanent magnet synchronous motor torque density with different slot engagement classifications. The paper puts forward an optimized model of permanent magnet synchronous motor without the iron core. The technology of the permanent magnet synchronous motor without iron core is studied based on this model which provides a method to design and manufacture the iron-core-free permanent magnet synchronous motor.

Download Full-text

Low-Complexity Hardware Interleaver/Deinterleaver for IEEE 802.11a/g/n WLAN

VLSI Design ◽

10.1155/2012/948957 ◽

2012 ◽

Vol 2012 ◽

pp. 1-7

Author(s):

Zhen-dong Zhang ◽

Bin Wu ◽

Yu-mei Zhou ◽

Xin Zhang

Keyword(s):

High Speed ◽

Critical Path ◽

Low Complexity ◽

Cmos Technology ◽

Path Delay ◽

Hardware Complexity ◽

Ieee 802.11A ◽

Comparison Results ◽

Mathematical Formulas ◽

High Flexibility

A high-speed low-complexity hardware interleaver/deinterleaver is presented. It supports all 77 802.11n high-throughput (HT) modulation and coding schemes (MCSs) with short and long guard intervals and the 8 non-HT MCSs defined in 802.11a/g. The paper proposes a design methodology that distributes the three permutations of an interleaver to both write address and read address. The methodology not only reduces the critical path delay but also facilitates the address generation. In addition, the complex mathematical formulas are replaced with optimized hardware structures in which hardware intensive dividers and multipliers are avoided. Using 0.13 um CMOS technology, the cell area of the proposed interleaver/deinterleaver is 0.07 mm2, and the synthesized maximal working frequency is 400 MHz. Comparison results show that it outperforms the three other similar works with respect to hardware complexity and max frequency while maintaining high flexibility.

Download Full-text

Rapid Design of DC Motor Speed Control System Based on MATLAB

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.743.168 ◽

2015 ◽

Vol 743 ◽

pp. 168-171 ◽

Cited By ~ 1

Author(s):

Xiao Lei Wang ◽

Tai Yuan Yin ◽

Jin Tao Chen ◽

Jian Xun Liang ◽

Yang Li

Keyword(s):

Control System ◽

High Speed ◽

High Efficiency ◽

Speed Control ◽

Dc Motor ◽

Motor Speed ◽

Loop Control ◽

Rapid Design ◽

Speed Control System ◽

Low Efficiency

DC motor speed control system is a typical closed-loop control system ofelectromechanical control subject. This paper presents a fast and efficient developing method ofcontrol system based on MATLAB, overcoming the shortcomings of the low efficiency and longdesign cycle in the traditional control system, and completing the rapid design of DC motor speedcontrol system, with its whole process based on MATLAB through the combination and applicationof the multiple toolboxes of the MATLAB. It applies the System Identification toolbox ofMATLAB to model the DC motor, the Simulink toolbox to simulate the control system, SimulinkDesign Optimization toolbox to optimize the PID parameters automatically, and the RTWtechnology to generate the codes for the DSP target board. Compared with the traditional designmethod, this method is characterized by high-efficiency, high-speed, and easy adjustment, havingcertain significance to the design of other control systems.

Download Full-text

A High-Performance Full Adder Circuit Based on a Novel 7-T XOR-XNOR Cell

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.361 ◽

2013 ◽

Vol 321-324 ◽

pp. 361-366

Author(s):

Yan Yu Ding ◽

De Ming Wang ◽

Qing Qing Huang ◽

Hong Zhou Tan

Keyword(s):

Power Dissipation ◽

High Performance ◽

Critical Path ◽

Full Adder ◽

Path Delay ◽

Feedback Path ◽

Voltage Loss ◽

Transmission Gates ◽

Voltage Swing ◽

Power Delay Product

A high performance full adder circuit with full voltage-swing based on a novel 7-transistor xor-xnor cell is proposed in this paper. In our design, we exploit a novel 7-transistor xor-xnor circuit with a signal level restorer in a feedback path to settle the threshold voltage loss problem. Then we present a new high-performance 1-bit full adder based on the designed xor-xnor cell, pass-transistors and transmission gates. The simulation results prove that, compared with other designs in literature, the proposed full adder shows its superiority for less power dissipation, lower critical path delay and smaller power-delay product, and still provides full voltage swing in all nodes of the circuit.

Download Full-text

A NOVEL VLSI ARCHITECTURE OF HIGH SPEED 1D DISCRETE WAVELET TRANSFORM

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2015.1147 ◽

2015 ◽

pp. 160-166

Author(s):

POOJA GUPTA ◽

Saroj Kumar Lenka

Keyword(s):

High Speed ◽

Performance Metrics ◽

Critical Path ◽

Vlsi Architecture ◽

Fir Filter ◽

Optimization Techniques ◽

Discrete Wavelet ◽

Path Delay ◽

Linear Phase ◽

Data Interleaving

This paper describes an efficient implementation for a multi-level convolution based 1-D DWT hardware architecture for use in FPGAs. The proposed architecture combines some hardware optimization techniques to develop a novel DWT architecture that has high performance and is suitable for portable and high speed devices. The first step towards the hardware implementation of the DWT algorithm was to choose the type of FIR filter block. Firstly we design the high speed linear phase FIR filter using pipelined and parallel arithmetic methods. This proposed filter employs efficiently distributed D-latches and multipliers. Furthermore this filter is used in the proposed DWT architecture. Thus, the new VLSI architecture based on combining of fast FIR filters for reducing the critical path delay and data interleaving technique for lower chip area. We synthesized the final design using Xilinx 9.1i ISE tool. We illustrate that a DWT design using a pipelined linear phase FIR filter coupled with data-interleaving gives the best combination of the performance metrics when compared to other DWT structures.

Download Full-text