scholarly journals Proof-directed program transformation: A functional account of efficient regular expression matching

2021 ◽  
Vol 31 ◽  
Author(s):  
ANDRZEJ FILINSKI

Abstract We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.

2009 ◽  
Vol 2009 ◽  
pp. 1-10 ◽  
Author(s):  
Yi-Hua E. Yang ◽  
Viktor K. Prasanna

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.


2021 ◽  
Vol 11 (2) ◽  
pp. 283-302
Author(s):  
Paul Meurer

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.


Algorithmica ◽  
2022 ◽  
Author(s):  
José Arturo Gil ◽  
Simone Santini

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.


2021 ◽  
Author(s):  
Nan Jiang ◽  
Ping Lin ◽  
Yulong He ◽  
Zhuozhi Tan ◽  
Jin Hu

10.29007/gpsh ◽  
2018 ◽  
Author(s):  
Abdulbasit Ahmed ◽  
Alexei Lisitsa ◽  
Andrei Nemytykh

It has been known for a while that program transformation techniques, in particular, program specialization, can be used to prove the properties of programs automatically. For example, if a program actually implements (in a given context of use) a constant function, sufficiently powerful and semantics preserving program transformation may reduce the program to a syntactically trivial ``constant'' program, pruning unreachable branches and proving thereby the property. Viability of such an approach to verification has been demonstrated in previous works where it was applied to the verification of parameterized cache coherence protocols and Petri Nets models.In this paper we further extend the method and present a case study on its appication to the verification of a cryptographic protocol. The protocol is modeled by functional programs at different levels of abstraction and verification via program specialization is done by using Turchin's supercompilation method.


Sign in / Sign up

Export Citation Format

Share Document