Proof-directed program transformation: A functional account of efficient regular expression matching

Abstract We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.

Software Toolchain for Large-Scale RE-NFA Construction on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2009/301512 ◽

2009 ◽

Vol 2009 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Yi-Hua E. Yang ◽

Viktor K. Prasanna

Keyword(s):

High Performance ◽

Large Scale ◽

Regular Expression ◽

Finite Automata ◽

Fixed Number ◽

Regular Expressions ◽

Pattern Complexity ◽

Area Increase ◽

Prototype Software

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

Designing efficient algorithms for querying large corpora

Oslo Studies in Language ◽

10.5617/osla.8504 ◽

2021 ◽

Vol 11 (2) ◽

pp. 283-302

Author(s):

Paul Meurer

Keyword(s):

Regular Expression ◽

Linear Time ◽

Suffix Array ◽

Efficient Algorithms ◽

Regular Expressions ◽

Efficient Treatment ◽

Suffix Arrays ◽

Finite State ◽

Query System

I describe several new efficient algorithms for querying large annotated corpora. The search algorithms as they are implemented in several popular corpus search engines are less than optimal in two respects: regular expression string matching in the lexicon is done in linear time, and regular expressions over corpus positions are evaluated starting in those corpus positions that match the constraints of the initial edges of the corresponding network. To address these shortcomings, I have developed an algorithm for regular expression matching on suffix arrays that allows fast lexicon lookup, and a technique for running finite state automata from edges with lowest corpus counts. The implementation of the lexicon as suffix array also lends itself to an elegant and efficient treatment of multi-valued and set-valued attributes. The described techniques have been implemented in a fully functional corpus management system and are also used in a treebank query system.

Matching Regular Expressions on uncertain data

Algorithmica ◽

10.1007/s00453-021-00906-8 ◽

2022 ◽

Author(s):

José Arturo Gil ◽

Simone Santini

Keyword(s):

Shortest Path ◽

Regular Expression ◽

Uncertain Data ◽

Regular Expressions ◽

Shortest Path Algorithm ◽

AbstractIn this paper we study regular expression matching in cases in which the identity of the symbols received is subject to uncertainty. We develop a model of symbol emission and uses a modification of the shortest path algorithm to find optimal matches on the Cartesian Graph of an expression provided that the input is a finite list. In the case of infinite streams, we show that the problem is in general undecidable but, if each symbols is received with probability 0 infinitely often, then with probability 1 the problem is decidable.

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems - ANCS '10 ◽

NFA split architecture for fast regular expression matching

10.1145/1872007.1872024 ◽

2010 ◽

Cited By ~ 1

Author(s):

Jan Kořenek ◽

Vlastimil Košař

Keyword(s):

Regular Expression ◽

Split Architecture

Feature-rich Regular Expression Matching Accelerator for Text Analytics

Journal of Signal Processing Systems ◽

10.1007/s11265-015-1052-y ◽

2015 ◽

Vol 85 (3) ◽

pp. 355-371 ◽

Cited By ~ 4

Author(s):

Kubilay Atasu

Keyword(s):

Regular Expression ◽

Text Analytics ◽

Design of A New Type of Regular Expression Matching Engine Based on FPGA

10.1109/asid52932.2021.9651676 ◽

2021 ◽

Author(s):

Nan Jiang ◽

Ping Lin ◽

Yulong He ◽

Zhuozhi Tan ◽

Jin Hu

Keyword(s):

Regular Expression ◽

New Type

Regular Expression Matching Processor Architecture Supporting Restraint and Nested Repetitive Operations

The Journal of Korean Institute of Communications and Information Sciences ◽

10.7840/kics.2021.46.9.1515 ◽

2021 ◽

Vol 46 (9) ◽

pp. 1515-1520

Author(s):

Byung-suk Seo

Keyword(s):

Regular Expression ◽

Processor Architecture ◽

Cryptographic Protocol Verification via Supercompilation (A Case Study)

10.29007/gpsh ◽

2018 ◽

Cited By ~ 1

Author(s):

Abdulbasit Ahmed ◽

Alexei Lisitsa ◽

Andrei Nemytykh

Keyword(s):

Program Transformation ◽

Cache Coherence ◽

Protocol Verification ◽

Cryptographic Protocol ◽

Levels Of Abstraction ◽

Transformation Techniques ◽

Program Specialization ◽

Coherence Protocols ◽

Different Levels

It has been known for a while that program transformation techniques, in particular, program specialization, can be used to prove the properties of programs automatically. For example, if a program actually implements (in a given context of use) a constant function, sufficiently powerful and semantics preserving program transformation may reduce the program to a syntactically trivial ``constant'' program, pruning unreachable branches and proving thereby the property. Viability of such an approach to verification has been demonstrated in previous works where it was applied to the verification of parameterized cache coherence protocols and Petri Nets models.In this paper we further extend the method and present a case study on its appication to the verification of a cryptographic protocol. The protocol is modeled by functional programs at different levels of abstraction and verification via program specialization is done by using Turchin's supercompilation method.