Subpath Queries on Compressed Graphs: A Survey

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.

Download Full-text

Finite-State Technology

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.39 ◽

2018 ◽

Author(s):

Mans Hulden

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Finite State Machines ◽

Regular Languages ◽

Finite State Automata ◽

State Machines ◽

Computational Phonology ◽

Finite State

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.

Download Full-text

FINITE STATE PROCESSES, Z-TEMPORAL LOGIC AND THE MONADIC THEORY OF THE INTEGERS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054192000152 ◽

1992 ◽

Vol 03 (03) ◽

pp. 233-244 ◽

Cited By ~ 1

Author(s):

A. SAOUDI ◽

D.E. MULLER ◽

P.E. SCHUPP

Keyword(s):

Temporal Logic ◽

Linear Temporal Logic ◽

Second Order ◽

Regular Languages ◽

Infinite Words ◽

First Order ◽

Monadic Theory ◽

Finite State ◽

Temporal Formula

We introduce four classes of Z-regular grammars for generating bi-infinite words (i.e. Z-words) and prove that they generate exactly Z-regular languages. We extend the second order monadic theory of one successor to the set of the integers (i.e. Z) and give some characterizations of this theory in terms of Z-regular grammars and Z-regular languages. We prove that this theory is decidable and equivalent to the weak theory. We also extend the linear temporal logic to Z-temporal logic and then prove that each Z-temporal formula is equivalent to a first order monadic formula. We prove that the correctness problem for finite state processes is decidable.

Download Full-text

Tricolor automata

Proceedings of Balisage: The Markup Conference 2015 ◽

10.4242/balisagevol15.sperberg-mcqueen01 ◽

2015 ◽

Author(s):

C. M. Sperberg-McQueen

Keyword(s):

Regular Languages ◽

The Other ◽

Finite State Automata ◽

Simple Application ◽

Finite State

Tricolor automata are extensions of finite state automata, intended for the comparison of two regular languages; states and arcs in the automaton are colored to indicate whether they are peculiar to one language or the other, or common to both. Their design represents a simple application to practical purposes of ideas derived from the work of Glushkov and Brzozowski. Examples are given to show how tricolor automata can be used to visualize the intersection, union, and set difference of two languages, and algorithms for constructing them are given.

Download Full-text

The genus of regular languages

Mathematical Structures in Computer Science ◽

10.1017/s0960129516000037 ◽

2016 ◽

Vol 28 (1) ◽

pp. 14-44 ◽

Cited By ~ 1

Author(s):

GUILLAUME BONFANTE ◽

FLORIAN DELOUP

Keyword(s):

Lower Bounds ◽

Regular Languages ◽

Upper And Lower Bounds ◽

Generic Condition ◽

Finite State ◽

Large Genus ◽

Deterministic Automata

The paper defines and studies the genus of finite state deterministic automata (FSA) and regular languages. Indeed, an FSA can be seen as a graph for which the notion of genus arises. At the same time, an FSA has a semantics via its underlying language. It is then natural to make a connection between the languages and the notion of genus. After we introduce and justify the the notion of the genus for regular languages, the following questions are addressed. First, depending on the size of the alphabet, we provide upper and lower bounds on the genus of regular languages: we show that under a relatively generic condition on the alphabet and the geometry of the automata, the genus grows at least linearly in terms of the size of the automata. Second, we show that the topological cost of the powerset determinization procedure is exponential. Third, we prove that the notion of minimization is orthogonal to the notion of genus. Fourth, we build regular languages of arbitrary large genus: the notion of genus defines a proper hierarchy of regular languages.

Download Full-text

INFIX-FREE REGULAR EXPRESSIONS AND LANGUAGES

International Journal of Foundations of Computer Science ◽

10.1142/s0129054106003887 ◽

2006 ◽

Vol 17 (02) ◽

pp. 379-393 ◽

Cited By ~ 16

Author(s):

YO-SUB HAN ◽

YAJUN WANG ◽

DERICK WOOD

Keyword(s):

Polynomial Time ◽

Regular Language ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Regular Languages ◽

Finite State Automaton ◽

Primality Test ◽

Finite State ◽

State Pair ◽

Free Decomposition

We study infix-free regular languages. We observe the structural properties of finite-state automata for infix-free languages and develop a polynomial-time algorithm to determine infix-freeness of a regular language using state-pair graphs. We consider two cases: 1) A language is specified by a nondeterministic finite-state automaton and 2) a language is specified by a regular expression. Furthermore, we examine the prime infix-free decomposition of infix-free regular languages and design an algorithm for the infix-free primality test of an infix-free regular language. Moreover, we show that we can compute the prime infix-free decomposition in polynomial time. We also demonstrate that the prime infix-free decomposition is not unique.

Download Full-text

Finite State Automata, Regular Languages and Predicate Calculus

Word Processing in Groups ◽

10.1201/9781439865699-8 ◽

1992 ◽

pp. 15-38

Keyword(s):

Regular Languages ◽

Predicate Calculus ◽

Finite State Automata ◽

Finite State

Download Full-text

A Transformation-Based Approach to Implication of GSTE Assertion Graphs

Journal of Applied Mathematics ◽

10.1155/2013/709071 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7

Author(s):

Guowu Yang ◽

William N. N. Hung ◽

Xiaoyu Song ◽

Wensheng Guo

Keyword(s):

Model Checking ◽

Directed Graphs ◽

Classical Model ◽

Regular Languages ◽

Finite State Automaton ◽

Finite State Automata ◽

Symbolic Trajectory Evaluation ◽

Finite State ◽

Symbolic Trajectory ◽

Language Containment

Generalized symbolic trajectory evaluation (GSTE) is a model checking approach and has successfully demonstrated its powerful capacity in formal verification of VLSI systems. GSTE is an extension of symbolic trajectory evaluation (STE) to the model checking ofω-regular properties. It is an alternative to classical model checking algorithms where properties are specified as finite-state automata. In GSTE, properties are specified as assertion graphs, which are labeled directed graphs where each edge is labeled with two labeling functions: antecedent and consequent. In this paper, we show the complement relation between GSTE assertion graphs and finite-state automata with the expressiveness of regular languages andω-regular languages. We present an algorithm that transforms a GSTE assertion graph to a finite-state automaton and vice versa. By applying this algorithm, we transform the problem of GSTE assertion graphs implication to the problem of automata language containment. We demonstrate our approach with its application to verification of an FIFO circuit.

Download Full-text

Logical specification of regular relations for NLP

Natural Language Engineering ◽

10.1017/s1351324903003103 ◽

2003 ◽

Vol 9 (1) ◽

pp. 65-85 ◽

Cited By ~ 2

Author(s):

NATHAN VAILLETTE

Keyword(s):

Second Order ◽

Order Logic ◽

Regular Languages ◽

Description Language ◽

Logical Specification ◽

The Family ◽

Finite State ◽

Second Order Logic ◽

Monadic Second Order Logic

This paper describes how the use of monadic second-order logic for specifying regular languages can be extended for specifying regular relations, providing a declarative description language for finite state transductions of the sort used in NLP. We discuss issues arising in the integration into an automaton toolkit of an implementation of the conversion from logic formulas to automata. The utility of the logic of regular relations is demonstrated by showing how it can be used to define the family of replacement operators in a way that lends itself to straightforward proofs of correctness.

Download Full-text

Recent Trends in the Incidence of Clear Cell Adenocarcinoma and Survival Outcomes: A SEER Analysis

10.21203/rs.3.rs-121162/v1 ◽

2020 ◽

Author(s):

Anil Shrestha ◽

Niraj Maskey ◽

Xiaohui Dong ◽

Zongtai Zheng ◽

Fuhan Yang ◽

...

Keyword(s):

Clear Cell ◽

Relative Survival ◽

Population Data ◽

Clear Cell Adenocarcinoma ◽

Seer Database ◽

Primary Tumor Site ◽

Recent Trends ◽

Adjusted Incidence ◽

Year 2000 ◽

Over Time

Abstract Objective. To investigate recent trends in the epidemiological and prognostic factors of clear cell adenocarcinoma (CCA) which is considered a relatively rare tumor with a glycogen-rich phenotype. Methods. Patients with CCA from years 2000 to 2016 were identified from the Surveillance, Epidemiological, and End Results (SEER) database. Relevant population data were used to analyze the rates age-adjusted incidence, age-standardized 3-year and 5-year relative survivals, and overall survival (OS). Results. Of the 104,206 CCA patients identified. The age-adjusted incidence of CCA increased 2.7-fold from the year 2000 (3.3/100,000) to 2016 (8.8/100,000). This increase occurred across all ages, races, stages, and grades. Of all these subgroups, the increase was largest in the grade IV group. The age-standardized 3-year and 5-year relative survivals increased during this study period, rising by 9.1% and 9.5% from 2000 to 2011, respectively. Among all the stages and grades, the relative survival increase was greatest in the grade IV group. According to multivariate analysis of all CCA patients, predictors of OS were: age, gender, year of diagnosis, marital status, race, grade, stage, and primary tumor site (P < 0.001). The OS of all CCA patients during the period 2008 to 2016 was significantly higher than that from 2000 to 2007 (hazard ratio [HR], 0.87; 95% CI: 0.85–0.89; P < 0.001). Conclusions. The incidence of CCA and survival of these patients improved over time. In particular, the highest increases were reported for grade IV CCA, which may be due to an earlier diagnosis and improved treatment.

Download Full-text

Inference of finite-state transducers from regular languages

Pattern Recognition ◽

10.1016/j.patcog.2004.03.025 ◽

2005 ◽

Vol 38 (9) ◽

pp. 1431-1443 ◽

Cited By ~ 17

Author(s):

Francisco Casacuberta ◽

Enrique Vidal ◽

David Picó

Keyword(s):

Regular Languages ◽

Finite State Transducers ◽

Finite State

Download Full-text