ON THE AVERAGE SIZE OF GLUSHKOV AND PARTIAL DERIVATIVE AUTOMATA

In this paper, the relation between the Glushkov automaton [Formula: see text] and the partial derivative automaton [Formula: see text] of a given regular expression, in terms of transition complexity, is studied. The average transition complexity of [Formula: see text] was proved by Nicaud to be linear in the size of the corresponding expression. This result was obtained using an upper bound of the number of transitions of [Formula: see text]. Here we present a new quadratic construction of [Formula: see text] that leads to a more elegant and straightforward implementation, and that allows the exact counting of the number of transitions. Based on that, a better estimation of the average size is presented. Asymptotically, and as the alphabet size grows, the number of transitions per state is on average 2. Broda et al. computed an upper bound for the ratio of the number of states of [Formula: see text] to the number of states of [Formula: see text] which is about ½ for large alphabet sizes. Here we show how to obtain an upper bound for the number of transitions in [Formula: see text], which we then use to get an average case approximation. In conclusion, assymptotically, and for large alphabets, the size of [Formula: see text] is half the size of the [Formula: see text]. This is corroborated by some experiments, even for small alphabets and small regular expressions.

Download Full-text

ON THE AVERAGE STATE COMPLEXITY OF PARTIAL DERIVATIVE AUTOMATA: AN ANALYTIC COMBINATORICS APPROACH

International Journal of Foundations of Computer Science ◽

10.1142/s0129054111008908 ◽

2011 ◽

Vol 22 (07) ◽

pp. 1593-1606 ◽

Cited By ~ 13

Author(s):

SABINE BRODA ◽

ANTÓNIO MACHIAVELO ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Lower Bound ◽

Asymptotic Behaviour ◽

Partial Derivative ◽

Regular Expression ◽

Finite Automata ◽

Regular Expressions ◽

Alphabet Size ◽

State Complexity ◽

Analytic Combinatorics ◽

Average State

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.

Download Full-text

NORMALIZED EXPRESSIONS AND FINITE AUTOMATA

International Journal of Algebra and Computation ◽

10.1142/s021819670700355x ◽

2007 ◽

Vol 17 (01) ◽

pp. 141-154 ◽

Cited By ~ 11

Author(s):

J.-M. CHAMPARNAUD ◽

F. OUARDI ◽

D. ZIADI

Keyword(s):

Partial Derivative ◽

Regular Expression ◽

Linear Time ◽

Finite Automata ◽

Experimental Studies ◽

Regular Expressions ◽

Theoretical Comparison ◽

Theoretical Question

There exist two well-known quotients of the position automaton of a regular expression. The first one, called the equation automaton, was first introduced by Mirkin from the notion of prebase and has been redefined by Antimirov from the notion of partial derivative. The second one, due to Ilie and Yu and called the follow automaton, can be obtained by eliminating ε-transitions in an ε-NFA that is always smaller than the classical ε-NFAs (Thompson, Sippu and Soisalon–Soininen). Ilie and Yu discussed the difficulty of succeeding in a theoretical comparison between the size of the follow automaton and the size of the equation automaton and concluded that it is very likely necessary to realize experimental studies. In this paper we solve the theoretical question, by first defining a set of regular expressions, called normalized expressions, such that every regular expression can be normalized in linear time, and proving then that the equation automaton of a normalized expression is always smaller than its follow automaton.

Download Full-text

Stochastic Flips on Dimer Tilings

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2803 ◽

2010 ◽

Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽

Author(s):

Thomas Fernique ◽

Damien Regnault

Keyword(s):

Fixed Point ◽

Markov Process ◽

Upper Bound ◽

Numerical Experiments ◽

Triangular Grid ◽

Expected Number ◽

Worst Case ◽

Average Case ◽

International Audience

International audience This paper introduces a Markov process inspired by the problem of quasicrystal growth. It acts over dimer tilings of the triangular grid by randomly performing local transformations, called $\textit{flips}$, which do not increase the number of identical adjacent tiles (this number can be thought as the tiling energy). Fixed-points of such a process play the role of quasicrystals. We are here interested in the worst-case expected number of flips to converge towards a fixed-point. Numerical experiments suggest a $\Theta (n^2)$ bound, where $n$ is the number of tiles of the tiling. We prove a $O(n^{2.5})$ upper bound and discuss the gap between this bound and the previous one. We also briefly discuss the average-case.

Download Full-text

A Tight Upper Bound on the Number of Variables for Average-Case k-Clique on Ordered Graphs

Logic, Language, Information and Computation - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32621-9_21 ◽

2012 ◽

pp. 282-290

Author(s):

Benjamin Rossman

Keyword(s):

Upper Bound ◽

Average Case ◽

Ordered Graphs

Download Full-text

Random Regular Expression Over Huge Alphabets

International Journal of Foundations of Computer Science ◽

10.1142/s012905412141001x ◽

2021 ◽

pp. 1-20

Author(s):

Cyril Nicaud ◽

Pablo Rotondo

Keyword(s):

Regular Expression ◽

Empty Word ◽

Regular Expressions ◽

Expected Number ◽

Analytic Combinatorics ◽

Transfer Theorem ◽

Leading Term

In this article, we study some properties of random regular expressions of size [Formula: see text], when the cardinality of the alphabet also depends on [Formula: see text]. For this, we revisit and improve the classical Transfer Theorem from the field of analytic combinatorics. This provides precise estimations for the number of regular expressions, the probability of recognizing the empty word and the expected number of Kleene stars in a random expression. For all these statistics, we show that there is a threshold when the size of the alphabet approaches [Formula: see text], at which point the leading term in the asymptotics starts oscillating.

Download Full-text

Software Toolchain for Large-Scale RE-NFA Construction on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2009/301512 ◽

2009 ◽

Vol 2009 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Yi-Hua E. Yang ◽

Viktor K. Prasanna

Keyword(s):

High Performance ◽

Large Scale ◽

Regular Expression ◽

Finite Automata ◽

Fixed Number ◽

Regular Expressions ◽

Pattern Complexity ◽

Regular Expression Matching ◽

Area Increase ◽

Prototype Software

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.

Download Full-text

Towards an Effective Syntax and a Generator for Deterministic Standard Regular Expressions

The Computer Journal ◽

10.1093/comjnl/bxy110 ◽

2018 ◽

Vol 62 (9) ◽

pp. 1322-1341

Author(s):

Zhiwu Xu ◽

Ping Lu ◽

Haiming Chen

Keyword(s):

Regular Expression ◽

Xml Schema ◽

Experimental Results ◽

Regular Expressions ◽

Context Free ◽

Core Part ◽

Context Free Grammars

Abstract Deterministic regular expressions are a core part of XML Schema and used in other applications. But unlike regular expressions, deterministic regular expressions do not have a simple syntax, instead they are defined in a semantic manner. Moreover, not every regular expression can be rewritten to an equivalent deterministic regular expression. These properties of deterministic regular expressions put a burden on the user to develop XML Schema Definitions and to use deterministic regular expressions. In this paper, we propose a syntax for deterministic standard regular expressions (DREGs), and prove that the syntax of DREGs is context-free. Based on the context-free grammars for DREGs, we further design a generator for DREGs, which can generate DREGs randomly, and be used in applications associated with DREGs, e.g. benchmarking a validator for DTD or XML Schema, and inclusion checking of DTD and XML Schema. Experimental results demonstrate the efficiency and usefulness of the generator.

Download Full-text

AVERAGE VALUE OF SUM OF EXPONENTS OF RUNS IN A STRING

International Journal of Foundations of Computer Science ◽

10.1142/s0129054109007078 ◽

2009 ◽

Vol 20 (06) ◽

pp. 1135-1146 ◽

Cited By ~ 1

Author(s):

KAZUHIKO KUSANO ◽

WATARU MATSUBARA ◽

AKIRA ISHINO ◽

AYUMI SHINOHARA

Keyword(s):

Upper Bound ◽

Unit Length ◽

Closed Formula ◽

Alphabet Size ◽

Average Value ◽

Limit Value ◽

Value Of It ◽

Binary Strings

A substring w[i.j] in w is called a repetition of period p if w[k] = w[k + p] for any i ≤ k ≤ j - p. Especially, a maximal repetition, which cannot be extended neither to left nor to right, is called a run. The ratio of the length of the run to its period, i.e. [Formula: see text], is called an exponent. The sum of exponents of runs in a string is of interest. The maximal value of the sum is still unknown, and the current upper bound is 2.9n given by Crochemore and Ilie, where n is the length of a string. In this paper we show a closed formula which exactly expresses the average value of it for any n and any alphabet size, and the limit of this value per unit length as n approaches infinity. For binary strings, the limit value is approximately 1.13103. We also show the average number of squares in a string of length n and its limit value.

Download Full-text

ANTIMIROV AND MOSSES'S REWRITE SYSTEM REVISITED

International Journal of Foundations of Computer Science ◽

10.1142/s0129054109006802 ◽

2009 ◽

Vol 20 (04) ◽

pp. 669-684 ◽

Cited By ~ 8

Author(s):

MARCO ALMEIDA ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Finite Automata ◽

Functional Approach ◽

Regular Expressions ◽

Average Case ◽

Deterministic Finite Automata ◽

Preliminary Results ◽

Functional Version ◽

Comparative Results ◽

Rewrite System ◽

Extended Regular Expressions

Antimirov and Mosses proposed a rewrite system for deciding the equivalence of two (extended) regular expressions. They argued that this method could lead to a better average-case algorithm than those based on the comparison of the equivalent minimal deterministic finite automata. In this paper we present a functional approach to that method, prove its correctness, and give some experimental comparative results. Besides an improved functional version of Antimirov and Mosses's algorithm, we present an alternative one using partial derivatives. Our preliminary results lead to the conclusion that, indeed, these methods are feasible and, most of the time, faster than the classical methods.

Download Full-text

The Design of a Verified Derivative-Based Parsing Tool for Regular Expressions

CLEI electronic journal ◽

10.19153/cleiej.24.3.2 ◽

2021 ◽

Vol 24 (3) ◽

Author(s):

Elton Cardoso ◽

Maycon Amaro ◽

Samuel Feitosa ◽

Leonardo Reis ◽

André Du Bois ◽

...

Keyword(s):

Regular Expression ◽

Input String ◽

Regular Expressions

We describe the formalization of Brzozowski and Antimirov derivative based algorithms for regular expression parsing, in the dependently typed language Agda. The formalization produces a proof that either an input string matches a given regular expression or that no matching exists. A tool for regular expression based search in the style of the well known GNU grep has been developed with the certified algorithms. Practical experiments conducted with this tool are reported.

Download Full-text