scholarly journals Two-Phase PFAC Algorithm for Multiple Patterns Matching on CUDA GPUs

Electronics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 270
Author(s):  
Wei-Shen Lai ◽  
Chao-Chin Wu ◽  
Lien-Fu Lai ◽  
Min-Chi Sie

The rapid advancement of high speed networks has resulted in a significantly increasing number of network packets per second nowadays, implying network intrusion detection systems (NIDSs) need to accelerate the inspection of packet content to protect the computer systems from attacks. On average, the pattern matching process in a NIDS consumes approximately 70% of the overall processing time. The conventional Aho–Corasick (AC) algorithm, adopting a finite state machine to identify attack patterns in NIDSs, is too slow to meet the requirement of high speed networks. In view of this, several studies have used the features of a graphics processing unit (GPU) to improve the core searching process of the AC algorithm. For instance, parallel failureless Aho-Corasick (PFAC) algorithm improves the process of pattern matching effectively by removing backward branches in the original finite state machine created using the AC algorithm. In this way, boundary detection can be avoided totally if we allocate an individual thread to each byte of an input stream to identify any pattern starting at the thread’s starting position. However, through analysis, we found that this algorithm experiences a serious load imbalance problem. Therefore, this paper proposes a two-phase PFAC algorithm to address the problem. A threshold is predefined to divide execution into two phases, and the failureless finite state machine is also decoupled into two parts accordingly. In the first phase, every thread identifies patterns by running the tiny part of the decoupled failureless finite state machine that are stored in fast shared memory. In the second phase, all the threads requiring further searching in a same block are regrouped into a few warps for less branch divergence. According to experimental results, the proposed algorithm shows a performance improvement of 50% compared to the PFAC algorithm.

Author(s):  
B. SRILATHA ◽  
KRISHNA KISHORE

One way to detect and thwart a network attack is to compare each incoming packet with predefined patterns, also Called an attack pattern database, and raise an alert upon detecting a match. This article presents a novel pattern-matching Engine that exploits a memory-based, programmable state machine to achieve deterministic processing rates that are Independent of packet and pattern characteristics. Our engine is a self addressable memory based finite state machine (samFsm), whose current state coding exhibits all its possible next states. Moreover, it is fully reconfigurable in that new attack Patterns can be updated easily. A methodology was developed to program the memory and logic. Specifically, we merge “non-equivalent” states by introducing “super characters” on their inputs to further enhance memory efficiency without Adding labels. This is the most high speed self addressable memory based fsm.sam-fsm is one of the most storage-Efficient machines and reduces the memory requirement by 60 times. Experimental results are presented to demonstrate the Validity of sam-fsm.


2015 ◽  
Vol 24 (07) ◽  
pp. 1550101 ◽  
Author(s):  
Raouf Senhadji-Navaro ◽  
Ignacio Garcia-Vargas

This work is focused on the problem of designing efficient reconfigurable multiplexer banks for RAM-based implementations of reconfigurable state machines. We propose a new architecture (called combination-based reconfigurable multiplexer bank, CRMUX) that use multiplexers simpler than that of the state-of-the-art architecture (called variation-based reconfigurable multiplexer bank, VRMUX). The performance (in terms of speed, area and reconfiguration cost) of both architectures is compared. Experimental results from MCNC finite state machine (FSM) benchmarks show that CRMUX is faster and more area-efficient than VRMUX. The reconfiguration cost of both multiplexer banks is studied using a behavioral model of a reconfigurable state machine. The results show that the reconfiguration cost of CRMUX is lower than that of VRMUX in most cases.


2003 ◽  
Vol 21 (4) ◽  
pp. 501-512 ◽  
Author(s):  
M. Desai ◽  
R. Gupta ◽  
A. Karandikar ◽  
K. Saxena ◽  
V. Samant

VLSI Design ◽  
1999 ◽  
Vol 9 (2) ◽  
pp. 105-117 ◽  
Author(s):  
M. S. Krishnamoorthy ◽  
James R. Loy ◽  
John F. McDonald

Noise margins in high speed digital systems continue to erode. Full differential signal routing provides a mechanism for deferring these effects. This paper proposes a three stage routing process for solving the adjacent placement routing problem of differential signal pairs, and proves that it is optimal. The process views differential pairs as logical nets; routes the logical nets; then bifurcates the result to achieve a physical realization. Finite state machine theory provides the critical theoretical underpinning and formal proof of correctness necessary for linear time bifurcation. Regular expressions map the theoretical solution to an appropriate implementation strategy that employs feature vectors for net recognition.


2021 ◽  
Vol 92 ◽  
pp. 107094
Author(s):  
Junaid Shabbir Abbasi ◽  
Faisal Bashir ◽  
Kashif Naseer Qureshi ◽  
Muhammad Najam ul Islam ◽  
Gwanggil Jeon

2020 ◽  
Vol 9 (2) ◽  
pp. 1179-1183

In the present era of high speed computation with the multicore and other parallel processors in the computational field, there are still some organizations which rely on their old software systems developed years ago, which over the time have been subjected to continous development by different developers. Even though these softwares persist with the old and little in use technology, they still work to satisfy the operational demands of the organizations and have kept them going in the competetive industry. These systems which have with time grown into legacy, embed the major business functionalities of the organization, which is but effort of years. Hence a methodology is required to rebuild the legacy system to make them suitable for execution on to the present computation systems. The paper discusses a research work, wherein work is done to realize points of latent parallelism in a sequentially executing legacy ‘C’ program which is initially restructured and the design information abstracted. A technique using finite state machine is proposed to identify tasks, events, processes and jobs in the program, which helps to locate functionally independent computational units in the program. Furthur using the slicing technique, slicing is performed to extract out the appropriate lines of codes defined by the slicing criteria, which assembled together form a functionality that can be executed in parallel with other extracted functional modules or computational units on any parallel computational platform.


2014 ◽  
Vol 571-572 ◽  
pp. 881-884
Author(s):  
Yi Ding ◽  
Xi Duan ◽  
Jun Liu

A high speed and high resolution semi-digital DLL (Delay Locked Loop) circuit will be discussed. The circuit is composed of three blocks: delay line, phase detector and digital finite-state machine (FSM). The delay line consists of two steps: the coarse tuning by tapping and the fine delay using interpolation to enable a resolution as high as 2 picoseconds. With the two steps approach and configuration of delay line, 3 GHz speed and picoseconds-level resolution can be achieved.


Sign in / Sign up

Export Citation Format

Share Document