Scalable multi-pipeline architecture for high performance multi-pattern string matching

Author(s):  
Weirong Jiang ◽  
Yi-Hua E. Yang ◽  
Viktor K. Prasanna
2019 ◽  
Vol 28 (14) ◽  
pp. 1950237
Author(s):  
Ling Zheng ◽  
Zhiliang Qiu ◽  
Weina Wang ◽  
Weitao Pan ◽  
Shiyong Sun ◽  
...  

Network flow classification is a key function in high-speed switches and routers. It directly determines the performance of network devices. With the development of the Internet and various kinds of applications, the flow classification needs to support multi-dimensional fields, large rule sets, and sustain a high throughput. Software-based classification cannot meet the performance requirement as high as 100 Gbps. FPGA-based flow classification methods can achieve a very high throughput. However, the range matching is still challenging. For this, this paper proposes a range supported bit vector (RSBV) method. First, the characteristic of range matching is analyzed, then the rules are pre-encoded and stored in memory. Second, the fields of an input packet header are used as addresses to read the memory, and the result of range matching is derived through pipelined Boolean operations. On this basis, bit vector for any types of fields (AFBV) is further proposed, which supports the flow classification for multi-dimensional fields efficiently, including exact matching, longest prefix matching, range matching, and arbitrary wildcard matching. The proposed methods are implemented in FPGA platform. Through a two-dimensional pipeline architecture, the AFBV can operate at a high clock frequency and can achieve a processing speed of more than 100 Gbps. Simulation results show that for a rule set of 512-bit width and 1[Formula: see text]k rules, the AFBV can achieve a throughput of 520 million packets per second (MPPS). The performance is improved by 44% compared with FSBV and 30% compared with Stride BV. The power consumption is reduced by about 43% compared with TCAM solution.


2017 ◽  
Vol 2017 ◽  
pp. 1-12
Author(s):  
Shaobing Huang ◽  
Li Yu ◽  
Fangjian Han ◽  
Yiwen Luo

Adaptive beamforming is a powerful technique for anti-interference, where searching and tracking optimal solutions are a great challenge. In this paper, a partial Particle Swarm Optimization (PSO) algorithm is proposed to track the optimal solution of an adaptive beamformer due to its great global searching character. Also, due to its naturally parallel searching capabilities, a novel Field Programmable Gate Arrays (FPGA) pipeline architecture using polyphase filter bank structure is designed. In order to perform computations with large dynamic range and high precision, the proposed implementation algorithm uses an efficient user-defined floating-point arithmetic. In addition, a polyphase architecture is proposed to achieve full pipeline implementation. In the case of PSO with large population, the polyphase architecture can significantly save hardware resources while achieving high performance. Finally, the simulation results are presented by cosimulation with ModelSim and SIMULINK.


2015 ◽  
Vol 2015 ◽  
pp. 1-20 ◽  
Author(s):  
Nhat-Phuong Tran ◽  
Myungho Lee ◽  
Dong Hoon Choi

Aho-Corasick (AC) algorithm is a multiple patterns string matching algorithm commonly used in computer and network security and bioinformatics, among many others. In order to meet the highly demanding computational requirements imposed on these applications, achieving high performance for the AC algorithm is crucial. In this paper, we present a high performance parallelization of the AC on the many-core accelerator chips such as the Graphic Processing Unit (GPU) from Nvidia and the Intel Xeon Phi. Our parallelization approach significantly improves the cache locality of the AC by partitioning a given set of string patterns into multiple smaller sets of patterns in a space-efficient way. Using the multiple pattern sets, intensive pattern matching operations are concurrently conducted with respect to the whole input text data. Compared with the previous approaches where the input data is partitioned amongst multiple threads instead of partitioning the pattern set, our approach significantly improves the performance. Experimental results show that our approach leads up to 2.73 times speedup on the Nvidia K20 GPU and 2.00 times speedup on the Intel Xeon Phi compared with the previous approach. Our parallel implementation delivers up to 693 Gbps throughput performance on the K20.


Sign in / Sign up

Export Citation Format

Share Document