Partial Derivative Automaton for Regular Expressions with Shuffle

In this paper, the relation between the Glushkov automaton [Formula: see text] and the partial derivative automaton [Formula: see text] of a given regular expression, in terms of transition complexity, is studied. The average transition complexity of [Formula: see text] was proved by Nicaud to be linear in the size of the corresponding expression. This result was obtained using an upper bound of the number of transitions of [Formula: see text]. Here we present a new quadratic construction of [Formula: see text] that leads to a more elegant and straightforward implementation, and that allows the exact counting of the number of transitions. Based on that, a better estimation of the average size is presented. Asymptotically, and as the alphabet size grows, the number of transitions per state is on average 2. Broda et al. computed an upper bound for the ratio of the number of states of [Formula: see text] to the number of states of [Formula: see text] which is about ½ for large alphabet sizes. Here we show how to obtain an upper bound for the number of transitions in [Formula: see text], which we then use to get an average case approximation. In conclusion, assymptotically, and for large alphabets, the size of [Formula: see text] is half the size of the [Formula: see text]. This is corroborated by some experiments, even for small alphabets and small regular expressions.

Download Full-text

On the State Complexity of Partial Derivative Automata For Regular Expressions with Intersection

Descriptional Complexity of Formal Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-41114-9_4 ◽

2016 ◽

pp. 45-59 ◽

Cited By ~ 2

Author(s):

Rafaela Bastos ◽

Sabine Broda ◽

António Machiavelo ◽

Nelma Moreira ◽

Rogério Reis

Keyword(s):

Partial Derivative ◽

The State ◽

Regular Expressions ◽

State Complexity

Download Full-text

ON THE AVERAGE STATE COMPLEXITY OF PARTIAL DERIVATIVE AUTOMATA: AN ANALYTIC COMBINATORICS APPROACH

International Journal of Foundations of Computer Science ◽

10.1142/s0129054111008908 ◽

2011 ◽

Vol 22 (07) ◽

pp. 1593-1606 ◽

Cited By ~ 13

Author(s):

SABINE BRODA ◽

ANTÓNIO MACHIAVELO ◽

NELMA MOREIRA ◽

ROGÉRIO REIS

Keyword(s):

Lower Bound ◽

Asymptotic Behaviour ◽

Partial Derivative ◽

Regular Expression ◽

Finite Automata ◽

Regular Expressions ◽

Alphabet Size ◽

State Complexity ◽

Analytic Combinatorics ◽

Average State

The partial derivative automaton ([Formula: see text]) is usually smaller than other nondeterministic finite automata constructed from a regular expression, and it can be seen as a quotient of the Glushkov automaton ([Formula: see text]). By estimating the number of regular expressions that have ε as a partial derivative, we compute a lower bound of the average number of mergings of states in [Formula: see text] and describe its asymptotic behaviour. This depends on the alphabet size, k, and for growing k's its limit approaches half the number of states in [Formula: see text]. The lower bound corresponds to consider the [Formula: see text] automaton for the marked version of the regular expression, i.e. where all its letters are made different. Experimental results suggest that the average number of states of this automaton, and of the [Formula: see text] automaton for the unmarked regular expression, are very close to each other.

Download Full-text

NORMALIZED EXPRESSIONS AND FINITE AUTOMATA

International Journal of Algebra and Computation ◽

10.1142/s021819670700355x ◽

2007 ◽

Vol 17 (01) ◽

pp. 141-154 ◽

Cited By ~ 11

Author(s):

J.-M. CHAMPARNAUD ◽

F. OUARDI ◽

D. ZIADI

Keyword(s):

Partial Derivative ◽

Regular Expression ◽

Linear Time ◽

Finite Automata ◽

Experimental Studies ◽

Regular Expressions ◽

Theoretical Comparison ◽

Theoretical Question

There exist two well-known quotients of the position automaton of a regular expression. The first one, called the equation automaton, was first introduced by Mirkin from the notion of prebase and has been redefined by Antimirov from the notion of partial derivative. The second one, due to Ilie and Yu and called the follow automaton, can be obtained by eliminating ε-transitions in an ε-NFA that is always smaller than the classical ε-NFAs (Thompson, Sippu and Soisalon–Soininen). Ilie and Yu discussed the difficulty of succeeding in a theoretical comparison between the size of the follow automaton and the size of the equation automaton and concluded that it is very likely necessary to realize experimental studies. In this paper we solve the theoretical question, by first defining a set of regular expressions, called normalized expressions, such that every regular expression can be normalized in linear time, and proving then that the equation automaton of a normalized expression is always smaller than its follow automaton.

Download Full-text

ESP corpus design: compilation of the Veterinary Nursing Medical Chart Corpus and the Veterinary Nursing Wordlist

Corpora ◽

10.3366/cor.2020.0191 ◽

2020 ◽

Vol 15 (2) ◽

pp. 125-140

Author(s):

Yukiko Ohashi ◽

Noriaki Katagiri ◽

Katsutoshi Oka ◽

Michiko Hanada

Keyword(s):

Word List ◽

English For Specific Purposes ◽

Regular Expressions ◽

Annotation Scheme ◽

Corpus Design ◽

As Species ◽

Lexical Items ◽

Access To Data ◽

General Service

This paper reports on two research results: ( 1) designing an English for Specific Purposes (esp) corpus architecture complete with annotations structured by regular expressions; and ( 2) a case study to test the design to cater for creating a specific vocabulary list using the compiled corpus. The first half of this study involved designing a precisely structured esp corpus from 190 veterinary medical charts with a hierarchy of the data. The data hierarchy in the corpus consists of document types, outline elements and inline elements, such as species and breed. Perl scripts extracted the data attached to veterinary-specific categories, and the extraction led to creating wordlists. The second part of the research tested the corpus mode, creating a list of commonly observed lexical items in veterinary medicine. The coverage rate of the wordlists by General Service List (gsl) and Academic Word List (awl) was tested, with the result that 66.4 percent of all lexical items appeared in gsl and awl, whereas 33.7 percent appeared in none of those lists. The corpus compilation procedures as well as the annotation scheme introduced in this study enable the compilation of specific corpora with explicit annotations, allowing teachers to have access to data required for creating esp classroom materials.

Download Full-text

QMine: A Framework for Mining Quantitative Regular Expressions from System Traces

2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C) ◽

10.1109/qrs-c51114.2020.00070 ◽

2020 ◽

Author(s):

Pradeep K. Mahato ◽

Apurva Narayan

Keyword(s):

Regular Expressions

Download Full-text

A Complete Proof System for 1-Free Regular Expressions Modulo Bisimilarity

Proceedings of the 35th Annual ACM/IEEE Symposium on Logic in Computer Science ◽

10.1145/3373718.3394744 ◽

2020 ◽

Author(s):

Clemens Grabmayer ◽

Wan Fokkink

Keyword(s):

Proof System ◽

Regular Expressions ◽

Complete Proof

Download Full-text

Retaining all the path information for graph reachability queries based on regular expressions

2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) ◽

10.1109/fskd.2013.6816303 ◽

2013 ◽

Author(s):

Yifei Zhang ◽

Guoren Wang ◽

Changkuan Zhao ◽

Ende Zhang

Keyword(s):

Regular Expressions ◽

Path Information ◽

Graph Reachability ◽

Reachability Queries

Download Full-text

Modelling and Control Design of a V-Shaped Thermal Actuator System via Partial Derivative Equation Approach

Proceedings of the 5th International Conference on Mechatronics and Robotics Engineering - ICMRE'19 ◽

10.1145/3314493.3314516 ◽

2019 ◽

Author(s):

Nguyen Tien Dzung ◽

Dao Phuong Nam ◽

Nguyen Quang Dich

Keyword(s):

Partial Derivative ◽

Control Design ◽

Thermal Actuator ◽

Equation Approach ◽

Actuator System ◽

And Control

Download Full-text

SSMBS: a web server to locate sequentially separated motifs in biological sequences

Journal of Applied Crystallography ◽

10.1107/s0021889809047050 ◽

2009 ◽

Vol 43 (1) ◽

pp. 203-205 ◽

Cited By ~ 1

Author(s):

Chetan Kumar ◽

K. Sekar

Keyword(s):

Amino Acids ◽

Web Server ◽

Nucleotide Sequences ◽

Regular Expressions ◽

Biological Sequences ◽

Sequence Motifs ◽

Specific Order ◽

The Web

The identification of sequence (amino acids or nucleotides) motifs in a particular order in biological sequences has proved to be of interest. This paper describes a computing server,SSMBS, which can locate and display the occurrences of user-defined biologically important sequence motifs (a maximum of five) present in a specific order in protein and nucleotide sequences. While the server can efficiently locate motifs specified using regular expressions, it can also find occurrences of long and complex motifs. The computation is carried out by an algorithm developed using the concepts of quantifiers in regular expressions. The web server is available to users around the clock at http://dicsoft1.physics.iisc.ernet.in/ssmbs/.

Download Full-text