AFBV: A High-Performance Network Flow Classification Method for Multi-Dimensional Fields and FPGA Implementation

Network flow classification is a key function in high-speed switches and routers. It directly determines the performance of network devices. With the development of the Internet and various kinds of applications, the flow classification needs to support multi-dimensional fields, large rule sets, and sustain a high throughput. Software-based classification cannot meet the performance requirement as high as 100 Gbps. FPGA-based flow classification methods can achieve a very high throughput. However, the range matching is still challenging. For this, this paper proposes a range supported bit vector (RSBV) method. First, the characteristic of range matching is analyzed, then the rules are pre-encoded and stored in memory. Second, the fields of an input packet header are used as addresses to read the memory, and the result of range matching is derived through pipelined Boolean operations. On this basis, bit vector for any types of fields (AFBV) is further proposed, which supports the flow classification for multi-dimensional fields efficiently, including exact matching, longest prefix matching, range matching, and arbitrary wildcard matching. The proposed methods are implemented in FPGA platform. Through a two-dimensional pipeline architecture, the AFBV can operate at a high clock frequency and can achieve a processing speed of more than 100 Gbps. Simulation results show that for a rule set of 512-bit width and 1[Formula: see text]k rules, the AFBV can achieve a throughput of 520 million packets per second (MPPS). The performance is improved by 44% compared with FSBV and 30% compared with Stride BV. The power consumption is reduced by about 43% compared with TCAM solution.

Download Full-text

High Speed FPGA Based 128-bit Advance Encryption Standard (AES)

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327911666210201104151 ◽

2021 ◽

Vol 11 ◽

Author(s):

Ibrahem M. T. Hamidi ◽

Farah S. H. Al-aassi

Keyword(s):

High Throughput ◽

Field Programmable Gate Array ◽

High Speed ◽

Design Tool ◽

Advanced Encryption Standard ◽

Clock Frequency ◽

Advance Encryption Standard ◽

Field Programmable ◽

Internal Component ◽

Gate Array

Aim: Achieve high throughput 128 bits FPGA based Advanced Encryption Standard. Background: Field Programmable Gate Array (FPGA) provides an efficient platform for design AES cryptography system. It provides the capability to control over each bit using HDL programming language such as VHDL and Verilog which results an output speed in Gbps rang. Objective: Use Field Programmable Gate Array (FPGA) to design high throughput 128 bits FPGA based Advanced Encryption Standard. Method: Pipelining technique has used to achieve maximum possible speed. The level of pipelining includes round pipelining and internal component pipelining where number of registers inserted in particular places to increase the output speed. The proposed design uses combinatorial logic to implement the byte substitution. The s-box implemented using composed field arithmetic with 7 stages of pipelining to reduce the combinatorial logic level. The presented model has implemented using VHDL in Xilinix ISETM 14.4 design tool. Result: The achieved results were 18.55 Gbps at a clock frequency of 144.96 MHz and area of 1568 Slices in Spartan3 xc3s1000 hardware. Conclusion: The results show that the proposed design reaches a high throughput with acceptable area usage compare with other designs in the literature.

Download Full-text

VLSI ARCHITECTURE OF PARALLEL MULTIPLIER– ACCUMULATOR BASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2012.1009 ◽

2012 ◽

pp. 40-46

Author(s):

Mr.M.V. Sathish ◽

Mrs. Sailaja

Keyword(s):

Signal Processing ◽

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Clock Frequency ◽

Parallel Multiplier ◽

Hybrid Type ◽

Standard Design ◽

Overall Performance ◽

And Performance

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

IMPLEMENTATION OF A REDUCED COMPLEXITY HIGH PERFORMANCE DATA ACQUISITION CHIP USING 0.18 MICRON TECHNOLOGY

SYNCHROINFO JOURNAL ◽

10.36724/2664-066x-2021-7-3-22-26 ◽

2021 ◽

Vol 7 (3) ◽

pp. 22-26

Author(s):

Hai P. Le ◽

◽

Aladin Azyegh ◽

Jugdutt Singh ◽

◽

...

Keyword(s):

Low Power ◽

Data Acquisition ◽

High Speed ◽

High Performance ◽

Modern Science ◽

Digital Data ◽

Clock Frequency ◽

Flash Adc ◽

Analog Signals ◽

Wide Range

Data acquisition (DAQ) in the general sense is the process of collecting information from the real world. For engineers and scientists, this data is mostly numerical and is usually collected, stored and analysed using computers. However, most of the input signals cannot be read directly by digital computers. Because they are generally analog signals distinguished by continuous values, while computers can only recognise digital signals containing only the on/off levels. DAQ systems are therefore inevitably necessary, as they include the translation requirements from analog signals to digital data. For this reason, they have become significant in wide range of applications in modern science and technology [1]. The paper precents the disign of a 12-bit high-speed low-power Data Acquisition (DAQ) Chip. In this paper, the disigns of the building block components are aimed at high-accuracy along with high-speed and low power dissipation. A modifided flash Analog-to-Digital converter (ADC) was used instead of the traditional flash proposed DAQ chip operates at 1 GHz master clock frequency and achieves a sampling speed of 125 MS/s. It dissipates only 64.9 mW of power as compared to 97.2 mW when traditional flash ADC was used.

Download Full-text

A modified Fresnel-based algorithm for 3D microwave imaging of metal objects

International Journal of Microwave and Wireless Technologies ◽

10.1017/s175907871800123x ◽

2018 ◽

Vol 11 (4) ◽

pp. 313-325

Author(s):

Farshad Zamiri ◽

Abdolreza Nabavi

Keyword(s):

High Speed ◽

Fourier Transforms ◽

Low Cost ◽

Three Dimensional ◽

Computation Time ◽

Microwave Imaging ◽

Reconstruction Algorithms ◽

Clock Frequency ◽

Pipeline Architecture ◽

The Impact

AbstractMicrowave holography technique reconstructs a target image using recorded amplitudes and phases of the signals reflected from the target with Fast Fourier Transform (FFT)-based algorithms. The reconstruction algorithms have two or more steps of two- and three-dimensional Fourier transforms, which have a high computational load. In this paper, by neglecting the impact of target depth on image reconstruction, an efficient Fresnel-based algorithm is proposed, involving only one-step FFT for both single- and multi-frequency microwave imaging. Numerous tests have been performed to show the effectiveness of the proposed algorithm including planar and non-planar targets, using the raw data gathered by means of a scanner operating in X-band. Finally, a low-cost and high-speed hardware architecture based on fixed-point arithmetic is introduced which reconstructs the planar targets. This pipeline architecture was tested on field programmable gate arrays operating at 200 MHz clock frequency, which illustrates more than 30 times improvement in computation time compared with a computer.

Download Full-text

Design of Low Power CMOS Comparator using 180nm Technology for ADC Application

Circulation in Computer Science ◽

10.22632/ccs-2017-mcsp027 ◽

2017 ◽

Vol MCSP2017 (01) ◽

pp. 11-13

Author(s):

Truptimayee Behera ◽

Ritisnigdha Das

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Input Voltage ◽

Clock Frequency ◽

Nmos Transistor ◽

Body Effect ◽

Low Power Dissipation ◽

Low Power Cmos

In our design of CMOS comparator with high performance using GPDK 180nm technology we optimize these parameters. We analyse the transient response of the schematic design and the gain is calculated in AC analysis and also we measure the power dissipation. The circuit is built by using PMOS and NMOS transistor with a body effect. A plot of phase and gain also discussed in the paper. Finally a test schematic is built and transient analysis for an input voltage of 2V is measured using Cadence virtuoso. Simulation results are presented and it shows that this design can work under high speed clock frequency 200MHz. The design has low power dissipation.

Download Full-text

A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

International Journal of Instrumentation Control and Automation ◽

10.47893/ijica.2011.1036 ◽

2011 ◽

pp. 196-202

Author(s):

P.Sasi Bala ◽

S. Raghavendra

Keyword(s):

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Alpha Power ◽

Clock Frequency ◽

Parallel Multiplier ◽

Standard Design ◽

Overall Performance ◽

And Performance ◽

Least Significant Bits

In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

Memory Optimization for Bit-Vector-Based Packet Classification on FPGA

Electronics ◽

10.3390/electronics8101159 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1159 ◽

Cited By ~ 6

Author(s):

Chenglong Li ◽

Tao Li ◽

Junnan Li ◽

Dagang Li ◽

Hui Yang ◽

...

Keyword(s):

High Throughput ◽

High Performance ◽

Classification Scheme ◽

Packet Classification ◽

Memory Consumption ◽

The Past ◽

Large Memory ◽

Bit Vector ◽

Memory Resources ◽

Very High

High-performance packet classification algorithms have been widely studied during the past decade. Bit-Vector-based algorithms proposed for FPGA can achieve very high throughput by decomposing rules delicately. However, the relatively large memory resources consumption severely hinders applications of the algorithms extensively. It is noteworthy that, in the Bit-Vector-based algorithms, stringent memory resources in FPGA are wasted to store relatively plenty of useless wildcards in the rules. We thus present a memory-optimized packet classification scheme named WeeBV to eliminate the memory occupied by the wildcards. WeeBV consists of a heterogeneous two-dimensional lookup pipeline and an optimized heuristic algorithm for searching all the wildcard positions that can be removed. It can achieve a significant reduction in memory resources without compromising the high throughput of the original Bit-Vector-based algorithms. We implement WeeBV and evaluate its performance by simulation and FPGA prototype. Experimental results show that our approach can save 37% and 41% memory consumption on average for synthetic 5-tuple rules and OpenFlow rules respectively.

Download Full-text

A New High-Performance Digital FM Modulator and Demodulator for Software-Defined Radio and Its FPGA Implementation

International Journal of Reconfigurable Computing ◽

10.1155/2011/342532 ◽

2011 ◽

Vol 2011 ◽

pp. 1-10 ◽

Cited By ~ 13

Author(s):

Indranil Hatai ◽

Indrajit Chakrabarti

Keyword(s):

Software Defined Radio ◽

High Speed ◽

High Performance ◽

Frequency Synthesizer ◽

Dynamic Range ◽

Fpga Implementation ◽

Clock Frequency ◽

System Clock ◽

Quarter Wave ◽

The Individual

This paper deals with an FPGA implementation of a high performance FM modulator and demodulator for software defined radio (SDR) system. The individual component of proposed FM modulator and demodulator has been optimized in such a way that the overall design consists of a high-speed, area optimized and low-power features. The modulator and demodulator contain an optimized direct digital frequency synthesizer (DDFS) based on quarter-wave symmetry technique for generating the carrier frequency with spurious free dynamic range (SFDR) of more than 64 dB. The FM modulator uses pipelined version of the DDFS to support the up conversion in the digital domain. The proposed FM modulator and demodulator has been implemented and tested using XC2VP30-7ff896 FPGA as a target device and can operate at a maximum frequency of 334.5 MHz and 131 MHz involving around 1.93 K and 6.4 K equivalent gates for FM modulator and FM demodulator respectively. After applying a 10 KHz triangular wave input and by setting the system clock frequency to 100 MHz using Xpower the power has been calculated. The FM modulator consumes 107.67 mW power while FM demodulator consumes 108.67 mW power for the same input running at same data rate.

Download Full-text

Improved Domino Logic Circuits and its Application in Wide Fan-In OR Gates

Micro and Nanosystems ◽

10.2174/1876402911666190716161631 ◽

2020 ◽

Vol 12 (1) ◽

pp. 58-67

Author(s):

Deepika Bansal ◽

Bal Chand Nagar ◽

Brahamdeo Prasad Singh ◽

Ajay Kumar

Keyword(s):

Power Consumption ◽

High Speed ◽

High Performance ◽

Voltage Drop ◽

Main Concern ◽

Clock Frequency ◽

Domino Logic ◽

Evaluation Phase ◽

Power Delay Product ◽

Domino Circuits

Background: Main concern in efficient VLSI circuit designing is low-power consumption, high-speed and noise tolerance capability. Objective: In this paper, two efficient and high-performance topologies are proposed for cascaded domino logic using carbon nanotube MOSFETs (CN-MOSFETs). The first topology is designed to remove the intermediate charge sharing problem without any keeper circuit, whereas the second one holds the true logic level of the evaluation phase without any voltage drop for next precharge phase. The proposed topologies are suitable for cascading of the high-performance domino circuits. Methods: The proposed domino circuits are tested and verified using Synopsys HSPICE simulator with 32nm CN-MOSFET technology provided by Stanford University. Conclusion: The power delay product of proposed DL-I and DL-II improves by 32.59 % and 40.98 % for 8-input OR gate as compared to standard logic respectively at the clock frequency of 500 MHz. The simulation results validate that the proposed circuits improve the performance of pseudo domino logic with respect to leakage power consumption, delay and unity noise gain.

Download Full-text

An Efficient and High-Speed Implementation of QRD-MGS Algorithm for STAP Application Based on Floating Point FPGAs

Journal of Circuits System and Computers ◽

10.1142/s0218126620500450 ◽

2019 ◽

Vol 29 (03) ◽

pp. 2050045

Author(s):

Narjes Hasanikhah ◽

Siavash Amin-Nejad ◽

Ghafar Darvish ◽

M. R. Moniri

Keyword(s):

High Speed ◽

High Performance ◽

Linear Equations ◽

Computation Time ◽

Floating Point ◽

Clock Frequency ◽

Vector Method ◽

The Matrix ◽

Update Rate ◽

Space Time Adaptive Processing

Space-Time Adaptive Processing (STAP) can harness the efficacy of interference and clutter significantly. Calculations of the STAP weights involve solving linear equations which require very intensive computations. In this paper, the QR decomposition (QRD) using the modified gram-schmidt (MGS) algorithm is parameterized with vector size to create a trade-off between the hardware resources utilization and computation time. To achieve an efficient floating point structure, the proposed architecture of QRD-MGS algorithm is simulated and implemented in two modes: single-vector and multi-vector. Results show that the multi-vector method can lead to a high-performance design with higher operating frequency, lower power consumption, and less resource utilization than the single-vector method. For example, Modelism simulations show that the decomposition of a [Formula: see text] matrix with vector size of 17 takes 7.86[Formula: see text][Formula: see text]s with the maximum clock frequency of 282[Formula: see text]MHz, for implementation on the Arria10 FPGA. In real STAP applications, the matrix sizes are too large to be fit on FPGAs and the update rate of the weights are high. Therefore, this method can fit any matrix in the contemporary FPGAs with an acceptable update rate.

Download Full-text