scholarly journals Software Toolchain for Large-Scale RE-NFA Construction on FPGA

2009 ◽  
Vol 2009 ◽  
pp. 1-10 ◽  
Author(s):  
Yi-Hua E. Yang ◽  
Viktor K. Prasanna

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

2012 ◽  
Vol 263-266 ◽  
pp. 3108-3113
Author(s):  
Wei He ◽  
Yun Fei Guo ◽  
Hong Chao Hu

Fast data transmission put forward high requirements on network content matching (NCM). Due to the high time complexity, Nondeterministic Finite Automata (NFA) was unable to meet the demand of regular expression matching (REM) which was the core of NCM; Transfer NFA to Deterministic Finite Automaton (DFA) could enhance the throughput, but led to state explosion, which increased demand for memory. To balance memory and throughput, state explosion in the transformation from NFA to DFA has been analyzed and a new method DC-DFA is presented for large scale REM. DC-DFA is based on hybrid automata structure which composed of NFA and DFA. DC-DFA introduces GradeOne classification to cut the memory usage and deep classification to improve throughput. The results show that for serious state explosion, DC-DFA could reduce 75% DFA states and improve memory utilization efficiently while maintain high system throughput.


1985 ◽  
Vol 8 (3-4) ◽  
pp. 379-396
Author(s):  
Jerzy Wojciechowski

In this paper the notion of regular expression for finite automata on transfinite sequences /TF-automata/ is introduced. The characterization theorem for TF-automata is proved. From this theorem we conclude the decidability of the emptiness problem for TF-automata and the characterization theorem for finite automata on transfinite sequences of bounded lenght.


2011 ◽  
Vol 22 (07) ◽  
pp. 1593-1606 ◽  
Author(s):  
SABINE BRODA ◽  
ANTÓNIO MACHIAVELO ◽  
NELMA MOREIRA ◽  
ROGÉRIO REIS

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.


2021 ◽  
Vol 11 (2) ◽  
pp. 283-302
Author(s):  
Paul Meurer

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.


2021 ◽  
Vol 31 ◽  
Author(s):  
ANDRZEJ FILINSKI

Abstract We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.


Sign in / Sign up

Export Citation Format

Share Document