On Average Behaviour of Regular Expressions in Strong Star Normal Form

For regular expressions in (strong) star normal form a large set of efficient algorithms is known, from conversions into finite automata to characterisations of unambiguity. In this paper we study the average complexity of this class of expressions using analytic combinatorics. As it is not always feasible to obtain explicit expressions for the generating functions involved, here we show how to get the required information for the asymptotic estimates with an indirect use of the existence of Puiseux expansions at singularities. We study, asymptotically and on average, the alphabetic size, the size of the [Formula: see text]-follow automaton and of the position automaton, as well as the ratio and the size of these expressions to standard regular expressions.

Download Full-text

ON THE AVERAGE STATE COMPLEXITY OF PARTIAL DERIVATIVE AUTOMATA: AN ANALYTIC COMBINATORICS APPROACH

International Journal of Foundations of Computer Science ◽

10.1142/s0129054111008908 ◽

2011 ◽

Vol 22 (07) ◽

pp. 1593-1606 ◽

Cited By ~ 13

Author(s):

SABINE BRODA ◽

ANTÓNIO MACHIAVELO ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Lower Bound ◽

Asymptotic Behaviour ◽

Partial Derivative ◽

Regular Expression ◽

Finite Automata ◽

Regular Expressions ◽

Alphabet Size ◽

State Complexity ◽

Analytic Combinatorics ◽

Average State

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.

Download Full-text

Union-Freeness Revisited — Between Deterministic and Nondeterministic Union-Free Languages

International Journal of Foundations of Computer Science ◽

10.1142/s0129054121410070 ◽

2021 ◽

pp. 1-23

Author(s):

Benedek Nagy

Keyword(s):

Normal Form ◽

Regular Language ◽

Regular Expression ◽

Finite Automata ◽

Finite Union ◽

Regular Languages ◽

Regular Expressions ◽

Closure Properties ◽

Language Class ◽

Union Operation

Union-free expressions are regular expressions without using the union operation. Consequently, (nondeterministic) union-free languages are described by regular expressions using only concatenation and Kleene star. The language class is also characterised by a special class of finite automata: 1CFPAs have exactly one cycle-free accepting path from each of their states. Obviously such an automaton has exactly one accepting state. The deterministic counterpart of such class of automata defines the deterministic union-free (d-union-free, for short) languages. In this paper [Formula: see text]-free nondeterministic variants of 1CFPAs are used to define n-union-free languages. The defined language class is shown to be properly between the classes of (nondeterministic) union-free and d-union-free languages (in case of at least binary alphabet). In case of unary alphabet the class of n-union-free languages coincides with the class of union-free languages. Some properties of the new subregular class of languages are discussed, e.g., closure properties. On the other hand, a regular expression is in union normal form if it is a finite union of union-free expressions. It is well known that every regular expression can be written in union normal form, i.e., all regular languages can be described as finite unions of (nondeterministic) union-free languages. It is also known that the same fact does not hold for deterministic union-free languages, that is, there are regular languages that cannot be written as finite unions of d-union-free languages. As an important result here we show that every regular language can be defined by a finite union of n-union-free languages. This fact also allows to define n-union-complexity of regular languages.

Download Full-text

Derivatives and Finite Automata of Expressions in Star Normal Form

Language and Automata Theory and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-53733-7_17 ◽

2017 ◽

pp. 236-248

Author(s):

Haiming Chen ◽

Ping Lu

Keyword(s):

Normal Form ◽

Finite Automata

Download Full-text

Random Regular Expression Over Huge Alphabets

International Journal of Foundations of Computer Science ◽

10.1142/s012905412141001x ◽

2021 ◽

pp. 1-20

Author(s):

Cyril Nicaud ◽

Pablo Rotondo

Keyword(s):

Regular Expression ◽

Empty Word ◽

Regular Expressions ◽

Expected Number ◽

Analytic Combinatorics ◽

Transfer Theorem ◽

Leading Term

In this article, we study some properties of random regular expressions of size [Formula: see text], when the cardinality of the alphabet also depends on [Formula: see text]. For this, we revisit and improve the classical Transfer Theorem from the field of analytic combinatorics. This provides precise estimations for the number of regular expressions, the probability of recognizing the empty word and the expected number of Kleene stars in a random expression. For all these statistics, we show that there is a threshold when the size of the alphabet approaches [Formula: see text], at which point the leading term in the asymptotics starts oscillating.

Download Full-text

Towards a Normal Form and a Query Language for Extended Relations Defined by Regular Expressions

Journal of Database Management ◽

10.4018/jdm.2016040102 ◽

2016 ◽

Vol 27 (2) ◽

pp. 27-48

Author(s):

András Benczúr ◽

Gyula I. Szabó

Keyword(s):

Normal Form ◽

Data Base ◽

Data Model ◽

Query Language ◽

Xml Schema ◽

Relational Model ◽

Regular Expressions ◽

Functional Dependencies ◽

Decision Algorithm ◽

Implication Problem

This paper introduces a generalized data base concept that unites relational and semi structured data models. As an important theoretical result we could find a quadratic decision algorithm for the implication problem of functional and join dependencies defined on the united data model. As practical contribution we presented a normal form for the new data model as a tool for data base design. With our novel representations of regular expressions, a more effective searching method could be developed. XML elements are described by XML schema languages such as a DTD or an XML Schema definition. The instances of these elements are semi-structured tuples. A semi-structured tuple is an ordered list of (attribute: value) pairs. We may think of a semi-structured tuple as a sentence of a formal language, where the values are the terminal symbols and the attribute names are the non-terminal symbols. In the authors' former work (Szabó and Benczúr, 2015) they introduced the notion of the extended tuple as a sentence from a regular language generated by a grammar where the non-terminal symbols of the grammar are the attribute names of the tuple. Sets of extended tuples are the extended relations. The authors then introduced the dual language, which generates the tuple types allowed to occur in extended relations. They defined functional dependencies (regular FD - RFD) over extended relations. In this paper they rephrase the RFD concept by directly using regular expressions over attribute names to define extended tuples. By the help of a special vertex labeled graph associated to regular expressions the specification of substring selection for the projection operation can be defined. The normalization for regular schemas is more complex than it is in the relational model, because the schema of an extended relation can contain an infinite number of tuple types. However, the authors can define selection, projection and join operations on extended relations too, so a lossless-join decomposition can be performed. They extended their previous model to deal with XML schema indicators too, e.g., with numerical constraints. They added line and set constructors too, in order to extend their model with more general projection and selection operators. This model establishes a query language with table join functionality for collected XML element data.

Download Full-text

Software Toolchain for Large-Scale RE-NFA Construction on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2009/301512 ◽

2009 ◽

Vol 2009 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Yi-Hua E. Yang ◽

Viktor K. Prasanna

Keyword(s):

High Performance ◽

Large Scale ◽

Regular Expression ◽

Finite Automata ◽

Fixed Number ◽

Regular Expressions ◽

Pattern Complexity ◽

Regular Expression Matching ◽

Area Increase ◽

Prototype Software

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

Download Full-text