Software Toolchain for Large-Scale RE-NFA Construction on FPGA

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

Download Full-text

Hybrid Finite Automata-Based Algorithm for Large Scale Regular Expression Matching

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.3108 ◽

2012 ◽

Vol 263-266 ◽

pp. 3108-3113

Author(s):

Wei He ◽

Yun Fei Guo ◽

Hong Chao Hu

Keyword(s):

Large Scale ◽

Regular Expression ◽

Finite Automata ◽

System Throughput ◽

Hybrid Automata ◽

Deterministic Finite Automaton ◽

State Explosion ◽

Regular Expression Matching ◽

High System ◽

Memory Utilization

Fast data transmission put forward high requirements on network content matching (NCM). Due to the high time complexity, Nondeterministic Finite Automata (NFA) was unable to meet the demand of regular expression matching (REM) which was the core of NCM; Transfer NFA to Deterministic Finite Automaton (DFA) could enhance the throughput, but led to state explosion, which increased demand for memory. To balance memory and throughput, state explosion in the transformation from NFA to DFA has been analyzed and a new method DC-DFA is presented for large scale REM. DC-DFA is based on hybrid automata structure which composed of NFA and DFA. DC-DFA introduces GradeOne classification to cut the memory usage and deep classification to improve throughput. The results show that for serious state explosion, DC-DFA could reduce 75% DFA states and improve memory utilization efficiently while maintain high system throughput.

Download Full-text

Simultaneous Finite Automata: An Efficient Data-Parallel Model for Regular Expression Matching

2013 42nd International Conference on Parallel Processing ◽

10.1109/icpp.2013.31 ◽

2013 ◽

Cited By ~ 7

Author(s):

Ryoma Sinya ◽

Kiminori Matsuzaki ◽

Masataka Sassa

Keyword(s):

Regular Expression ◽

Finite Automata ◽

Parallel Model ◽

Data Parallel ◽

Regular Expression Matching ◽

Efficient Data

Download Full-text

Large-Scale Regular Expression Matching on FPGA

Handbook of Finite State Based Models and Applications ◽

10.1201/b13055-7 ◽

2016 ◽

pp. 31-56 ◽

Cited By ~ 6

Keyword(s):

Large Scale ◽

Regular Expression ◽

Regular Expression Matching

Download Full-text

Scalable TCAM-based regular expression matching with compressed finite automata

Architectures for Networking and Communications Systems ◽

10.1109/ancs.2013.6665178 ◽

2013 ◽

Cited By ~ 11

Author(s):

Kun Huang ◽

Linxuan Ding ◽

Gaogang Xie ◽

Dafang Zhang ◽

Alex X. Liu ◽

...

Keyword(s):

Regular Expression ◽

Finite Automata ◽

Regular Expression Matching

Download Full-text

A novel regular expression matching algorithm based on multi-dimensional finite automata

2014 IEEE 15th International Conference on High Performance Switching and Routing (HPSR) ◽

10.1109/hpsr.2014.6900887 ◽

2014 ◽

Author(s):

Yangyang Gong ◽

Qinrang Liu ◽

Xiangyu Shao ◽

Cong Pan ◽

Huijuan Jiao

Keyword(s):

Regular Expression ◽

Finite Automata ◽

Matching Algorithm ◽

Regular Expression Matching

Download Full-text

Finite Automata on Transfinite Sequences and Regular Expressions

Fundamenta Informaticae ◽

10.3233/fi-1985-83-407 ◽

1985 ◽

Vol 8 (3-4) ◽

pp. 379-396

Author(s):

Jerzy Wojciechowski

Keyword(s):

Regular Expression ◽

Finite Automata ◽

Characterization Theorem ◽

Regular Expressions ◽

Emptiness Problem

In this paper the notion of regular expression for finite automata on transfinite sequences /TF-automata/ is introduced. The characterization theorem for TF-automata is proved. From this theorem we conclude the decidability of the emptiness problem for TF-automata and the characterization theorem for finite automata on transfinite sequences of bounded lenght.

Download Full-text

ON THE AVERAGE STATE COMPLEXITY OF PARTIAL DERIVATIVE AUTOMATA: AN ANALYTIC COMBINATORICS APPROACH

International Journal of Foundations of Computer Science ◽

10.1142/s0129054111008908 ◽

2011 ◽

Vol 22 (07) ◽

pp. 1593-1606 ◽

Cited By ~ 13

Author(s):

SABINE BRODA ◽

ANTÓNIO MACHIAVELO ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Lower Bound ◽

Asymptotic Behaviour ◽

Partial Derivative ◽

Regular Expression ◽

Finite Automata ◽

Regular Expressions ◽

Alphabet Size ◽

State Complexity ◽

Analytic Combinatorics ◽

Average State

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.

Download Full-text

Designing efficient algorithms for querying large corpora

Oslo Studies in Language ◽

10.5617/osla.8504 ◽

2021 ◽

Vol 11 (2) ◽

pp. 283-302

Author(s):

Paul Meurer

Keyword(s):

Regular Expression ◽

Linear Time ◽

Suffix Array ◽

Efficient Algorithms ◽

Regular Expressions ◽

Efficient Treatment ◽

Suffix Arrays ◽

Regular Expression Matching ◽

Finite State ◽

Query System

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.

Download Full-text

Proof-directed program transformation: A functional account of efficient regular expression matching

Journal of Functional Programming ◽

10.1017/s0956796820000295 ◽

2021 ◽

Vol 31 ◽

Author(s):

ANDRZEJ FILINSKI

Keyword(s):

Program Transformation ◽

Formal Language ◽

Regular Expression ◽

State Machine ◽

Automata Theory ◽

Regular Expressions ◽

Transformation Techniques ◽

Standard Specification ◽

Correctness Proofs ◽

Regular Expression Matching

Abstract We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.

Download Full-text

A high performance NIDS using FPGA-based regular expression matching

Proceedings of the 2007 ACM symposium on Applied computing - SAC '07 ◽

10.1145/1244002.1244259 ◽

2007 ◽

Cited By ~ 13

Author(s):

Janghaeng Lee ◽

Sung Ho Hwang ◽

Neungsoo Park ◽

Seong-Won Lee ◽

Sunglk Jun ◽

...

Keyword(s):

High Performance ◽

Regular Expression ◽

Regular Expression Matching

Download Full-text