Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA) that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP) implementation and two pure Register-Transfer Level (RTL) implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

Download Full-text

A Fast, Smart Packet Classification Algorithm Based on Decomposition

Journal of Control Science and Engineering ◽

10.1155/2020/8843471 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Chuanhong Li ◽

Xuewen Zeng ◽

Lei Song ◽

Yan Jiang

Keyword(s):

Vital Role ◽

Packet Classification ◽

Classification Algorithm ◽

Set Partitioning ◽

Experimental Results ◽

Classification Algorithms ◽

Memory Consumption ◽

Memory Overhead ◽

Rule Sets ◽

Rule Set

Packet classification algorithms have been the focus of research for the last few years, due to the vital role they play in various services based on packet forwarding. However, as the number of rules in the rule set increases, not only the preprocessing time but also the memory consumption is increasing greatly. In this paper, we first model and analyze the above issue in depth. Then, a fast, smart packet classification algorithm based on decomposition is proposed. By boundary-based rule traversal and smart rule set partitioning, both the preprocessing time and memory consumption are reduced dramatically. Experimental results show that the preprocessing time of our method achieves 8.8-time improvement at maximum compared with the PCIU and achieves about 31.5-time improvement on average compared with CutSplit for large rule sets. Meanwhile, the memory overhead is reduced by 40% at maximum and 27.5% on average compared with the PCIU.

Download Full-text

Multiple decision-tree packet classification algorithm based on rule set partitioning

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.02450 ◽

2013 ◽

Vol 33 (9) ◽

pp. 2450-2454 ◽

Cited By ~ 1

Author(s):

Teng MA ◽

Shuqiao CHEN ◽

Xiaohui ZHANG ◽

Le TIAN

Keyword(s):

Decision Tree ◽

Packet Classification ◽

Classification Algorithm ◽

Set Partitioning ◽

Rule Set

Download Full-text

An efficient application-specific instruction-set processor for packet classification

2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig) ◽

10.1109/reconfig.2013.6732271 ◽

2013 ◽

Cited By ~ 2

Author(s):

Omar Ahmed ◽

Shawki Areibi

Keyword(s):

Packet Classification ◽

Instruction Set ◽

Specific Instruction ◽

Application Specific

Download Full-text

PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

International Journal of Reconfigurable Computing ◽

10.1155/2011/648483 ◽

2011 ◽

Vol 2011 ◽

pp. 1-21 ◽

Cited By ~ 5

Author(s):

O. Ahmed ◽

S. Areibi ◽

K. Chattha ◽

B. Kelly

Keyword(s):

State Of The Art ◽

Packet Classification ◽

Classification Algorithm ◽

Software Implementation ◽

Network Services ◽

Hardware Accelerator ◽

Memory Consumption ◽

Hardware Implementations ◽

Speed Up ◽

Incremental Update

Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

Download Full-text

Corrigendum to “PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability”

International Journal of Reconfigurable Computing ◽

10.1155/2018/9595171 ◽

2018 ◽

Vol 2018 ◽

pp. 1-1

Author(s):

O. Ahmed ◽

S. Areibi ◽

K. Chattha ◽

B. Kelly

Keyword(s):

Packet Classification ◽

Classification Algorithm ◽

Hardware Implementations ◽

Incremental Update

Download Full-text

Energy optimization of Application-Specific Instruction-Set Processors by using hardware accelerators in semicustom ICs technology

Microprocessors and Microsystems ◽

10.1016/j.micpro.2011.06.003 ◽

2012 ◽

Vol 36 (2) ◽

pp. 127-137 ◽

Cited By ~ 5

Author(s):

Uwe Meyer-Baese ◽

Guillermo Botella ◽

Soumak Mookherjee ◽

Encarnación Castillo ◽

Antonio García

Keyword(s):

Energy Optimization ◽

Hardware Accelerators ◽

Instruction Set ◽

Specific Instruction ◽

Instruction Set Processors ◽

Application Specific

Download Full-text

Corrigendum to “Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm”

International Journal of Reconfigurable Computing ◽

10.1155/2018/3489169 ◽

2018 ◽

Vol 2018 ◽

pp. 1-1

Author(s):

O. Ahmed ◽

S. Areibi ◽

G. Grewal

Keyword(s):

Packet Classification ◽

Classification Algorithm ◽

Hardware Accelerators

Download Full-text

Efficient Computing in Image Processing and DSPs with ASIP Based Multiplier

Recent Patents on Engineering ◽

10.2174/1872212112666180810150357 ◽

2019 ◽

Vol 13 (2) ◽

pp. 174-180

Author(s):

Poonam Sharma ◽

Ashwani Kumar Dubey ◽

Ayush Goyal

Keyword(s):

Image Processing ◽

High Speed ◽

Computation Time ◽

Digital Signal ◽

Instruction Set ◽

Computationally Efficient ◽

Specific Instruction ◽

Processor Core ◽

Speed Performance ◽

Application Specific

Background: With the growing demand of image processing and the use of Digital Signal Processors (DSP), the efficiency of the Multipliers and Accumulators has become a bottleneck to get through. We revised a few patents on an Application Specific Instruction Set Processor (ASIP), where the design considerations are proposed for application-specific computing in an efficient way to enhance the throughput. Objective: The study aims to develop and analyze a computationally efficient method to optimize the speed performance of MAC. Methods: The work presented here proposes the design of an Application Specific Instruction Set Processor, exploiting a Multiplier Accumulator integrated as the dedicated hardware. This MAC is optimized for high-speed performance and is the application-specific part of the processor; here it can be the DSP block of an image processor while a 16-bit Reduced Instruction Set Computer (RISC) processor core gives the flexibility to the design for any computing. The design was emulated on a Xilinx Field Programmable Gate Array (FPGA) and tested for various real-time computing. Results: The synthesis of the hardware logic on FPGA tools gave the operating frequencies of the legacy methods and the proposed method, the simulation of the logic verified the functionality. Conclusion: With the proposed method, a significant improvement of 16% increase in throughput has been observed for 256 steps iterations of multiplier and accumulators on an 8-bit sample data. Such an improvement can help in reducing the computation time in many digital signal processing applications where multiplication and addition are done iteratively.

Download Full-text