ON THE AVERAGE STATE COMPLEXITY OF PARTIAL DERIVATIVE AUTOMATA: AN ANALYTIC COMBINATORICS APPROACH

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.

Download Full-text

ON THE AVERAGE SIZE OF GLUSHKOV AND PARTIAL DERIVATIVE AUTOMATA

International Journal of Foundations of Computer Science ◽

10.1142/s0129054112400400 ◽

2012 ◽

Vol 23 (05) ◽

pp. 969-984 ◽

Cited By ~ 12

Author(s):

SABINE BRODA ◽

ANTÓNIO MACHIAVELO ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Partial Derivative ◽

Upper Bound ◽

Regular Expression ◽

Regular Expressions ◽

Average Case ◽

Alphabet Size ◽

Large Alphabet ◽

Exact Counting ◽

Average Transition ◽

Average Size

In this paper, the relation between the Glushkov automaton [Formula: see text] and the partial derivative automaton [Formula: see text] of a given regular expression, in terms of transition complexity, is studied. The average transition complexity of [Formula: see text] was proved by Nicaud to be linear in the size of the corresponding expression. This result was obtained using an upper bound of the number of transitions of [Formula: see text]. Here we present a new quadratic construction of [Formula: see text] that leads to a more elegant and straightforward implementation, and that allows the exact counting of the number of transitions. Based on that, a better estimation of the average size is presented. Asymptotically, and as the alphabet size grows, the number of transitions per state is on average 2. Broda et al. computed an upper bound for the ratio of the number of states of [Formula: see text] to the number of states of [Formula: see text] which is about ½ for large alphabet sizes. Here we show how to obtain an upper bound for the number of transitions in [Formula: see text], which we then use to get an average case approximation. In conclusion, assymptotically, and for large alphabets, the size of [Formula: see text] is half the size of the [Formula: see text]. This is corroborated by some experiments, even for small alphabets and small regular expressions.

Download Full-text

NORMALIZED EXPRESSIONS AND FINITE AUTOMATA

International Journal of Algebra and Computation ◽

10.1142/s021819670700355x ◽

2007 ◽

Vol 17 (01) ◽

pp. 141-154 ◽

Cited By ~ 11

Author(s):

J.-M. CHAMPARNAUD ◽

F. OUARDI ◽

D. ZIADI

Keyword(s):

Partial Derivative ◽

Regular Expression ◽

Linear Time ◽

Finite Automata ◽

Experimental Studies ◽

Regular Expressions ◽

Theoretical Comparison ◽

Theoretical Question

There exist two well-known quotients of the position automaton of a regular expression. The first one, called the equation automaton, was first introduced by Mirkin from the notion of prebase and has been redefined by Antimirov from the notion of partial derivative. The second one, due to Ilie and Yu and called the follow automaton, can be obtained by eliminating ε-transitions in an ε-NFA that is always smaller than the classical ε-NFAs (Thompson, Sippu and Soisalon–Soininen). Ilie and Yu discussed the difficulty of succeeding in a theoretical comparison between the size of the follow automaton and the size of the equation automaton and concluded that it is very likely necessary to realize experimental studies. In this paper we solve the theoretical question, by first defining a set of regular expressions, called normalized expressions, such that every regular expression can be normalized in linear time, and proving then that the equation automaton of a normalized expression is always smaller than its follow automaton.

Download Full-text

Random Regular Expression Over Huge Alphabets

International Journal of Foundations of Computer Science ◽

10.1142/s012905412141001x ◽

2021 ◽

pp. 1-20

Author(s):

Cyril Nicaud ◽

Pablo Rotondo

Keyword(s):

Regular Expression ◽

Empty Word ◽

Regular Expressions ◽

Expected Number ◽

Analytic Combinatorics ◽

Transfer Theorem ◽

Leading Term

In this article, we study some properties of random regular expressions of size [Formula: see text], when the cardinality of the alphabet also depends on [Formula: see text]. For this, we revisit and improve the classical Transfer Theorem from the field of analytic combinatorics. This provides precise estimations for the number of regular expressions, the probability of recognizing the empty word and the expected number of Kleene stars in a random expression. For all these statistics, we show that there is a threshold when the size of the alphabet approaches [Formula: see text], at which point the leading term in the asymptotics starts oscillating.

Download Full-text

Software Toolchain for Large-Scale RE-NFA Construction on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2009/301512 ◽

2009 ◽

Vol 2009 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Yi-Hua E. Yang ◽

Viktor K. Prasanna

Keyword(s):

High Performance ◽

Large Scale ◽

Regular Expression ◽

Finite Automata ◽

Fixed Number ◽

Regular Expressions ◽

Pattern Complexity ◽

Regular Expression Matching ◽

Area Increase ◽

Prototype Software

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

Download Full-text

State Complexity of Insertion

International Journal of Foundations of Computer Science ◽

10.1142/s0129054116500349 ◽

2016 ◽

Vol 27 (07) ◽

pp. 863-878 ◽

Cited By ~ 5

Author(s):

Yo-Sub Han ◽

Sang-Ki Ko ◽

Timothy Ng ◽

Kai Salomaa

Keyword(s):

Lower Bound ◽

Upper Bound ◽

Regular Language ◽

Finite Automata ◽

The State ◽

Tight Bound ◽

State Complexity ◽

Nondeterministic State

It is well known that the resulting language obtained by inserting a regular language to a regular language is regular. We study the nondeterministic and deterministic state complexity of the insertion operation. Given two incomplete DFAs of sizes m and n, we give an upper bound (m+2)·2mn−m−1·3m and find a lower bound for an asymp-totically tight bound. We also present the tight nondeterministic state complexity by a fooling set technique. The deterministic state complexity of insertion is 2Θ(mn) and the nondeterministic state complexity of insertion is precisely mn+2m, where m and n are the size of input finite automata. We also consider the state complexity of insertion in the case where the inserted language is bifix-free or non-returning.

Download Full-text

Regular Expressions with Lookahead

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.66330 ◽

2021 ◽

Vol 27 (4) ◽

pp. 324-340

Author(s):

Martin Berglund ◽

Brink van der Merwe ◽

Steyn van Litsenborgh

Keyword(s):

Finite Automata ◽

The State ◽

Regular Expressions ◽

Deterministic Finite Automata ◽

State Complexity

This paper investigates regular expressions which in addition to the standard operators of union, concatenation, and Kleene star, have lookaheads. We show how to translate regular expressions with lookaheads (REwLA) to equivalent Boolean automata having at most 3 states more than the length of the REwLA. We also investigate the state complexity when translating REwLA to equivalent deterministic finite automata (DFA).

Download Full-text

On the State Complexity of Partial Derivative Automata For Regular Expressions with Intersection

Descriptional Complexity of Formal Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-41114-9_4 ◽

2016 ◽

pp. 45-59 ◽

Cited By ~ 2

Author(s):

Rafaela Bastos ◽

Sabine Broda ◽

António Machiavelo ◽

Nelma Moreira ◽

Rogério Reis

Keyword(s):

Partial Derivative ◽

The State ◽

Regular Expressions ◽

State Complexity

Download Full-text

Finite Automata on Transfinite Sequences and Regular Expressions

Fundamenta Informaticae ◽

10.3233/fi-1985-83-407 ◽

1985 ◽

Vol 8 (3-4) ◽

pp. 379-396

Author(s):

Jerzy Wojciechowski

Keyword(s):

Regular Expression ◽

Finite Automata ◽

Characterization Theorem ◽

Regular Expressions ◽

Emptiness Problem

In this paper the notion of regular expression for finite automata on transfinite sequences /TF-automata/ is introduced. The characterization theorem for TF-automata is proved. From this theorem we conclude the decidability of the emptiness problem for TF-automata and the characterization theorem for finite automata on transfinite sequences of bounded lenght.

Download Full-text

State Complexity of k-Union and k-Intersection for Prefix-Free Regular Languages

International Journal of Foundations of Computer Science ◽

10.1142/s0129054115500124 ◽

2015 ◽

Vol 26 (02) ◽

pp. 211-227 ◽

Cited By ~ 1

Author(s):

Hae-Sung Eom ◽

Yo-Sub Han ◽

Kai Salomaa

Keyword(s):

Lower Bound ◽

Upper Bound ◽

Finite Automata ◽

Constant Factor ◽

The State ◽

Regular Languages ◽

Deterministic Finite Automata ◽

State Complexity ◽

Binary Alphabet ◽

Multiple Intersections

We investigate the state complexity of multiple unions and of multiple intersections for prefix-free regular languages. Prefix-free deterministic finite automata have their own unique structural properties that are crucial for obtaining state complexity upper bounds that are improved from those for general regular languages. We present a tight lower bound construction for k-union using an alphabet of size k + 1 and for k-intersection using a binary alphabet. We prove that the state complexity upper bound for k-union cannot be reached by languages over an alphabet with less than k symbols. We also give a lower bound construction for k-union using a binary alphabet that is within a constant factor of the upper bound.

Download Full-text

On Average Behaviour of Regular Expressions in Strong Star Normal Form

International Journal of Foundations of Computer Science ◽

10.1142/s0129054119400227 ◽

2019 ◽

Vol 30 (06n07) ◽

pp. 899-920 ◽

Cited By ~ 1

Author(s):

Sabine Broda ◽

António Machiavelo ◽

Nelma Moreira ◽

Rogério Reis

Keyword(s):

Normal Form ◽

Generating Functions ◽

Finite Automata ◽

Regular Expressions ◽

Large Set ◽

Asymptotic Estimates ◽

Analytic Combinatorics ◽

Average Complexity ◽

Average Behaviour ◽

Puiseux Expansions

For regular expressions in (strong) star normal form a large set of efficient algorithms is known, from conversions into finite automata to characterisations of unambiguity. In this paper we study the average complexity of this class of expressions using analytic combinatorics. As it is not always feasible to obtain explicit expressions for the generating functions involved, here we show how to get the required information for the asymptotic estimates with an indirect use of the existence of Puiseux expansions at singularities. We study, asymptotically and on average, the alphabetic size, the size of the [Formula: see text]-follow automaton and of the position automaton, as well as the ratio and the size of these expressions to standard regular expressions.

Download Full-text