scholarly journals Subpath Queries on Compressed Graphs: A Survey

Algorithms ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 14
Author(s):  
Nicola Prezza

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages.

Author(s):  
Mans Hulden

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.


1992 ◽  
Vol 03 (03) ◽  
pp. 233-244 ◽  
Author(s):  
A. SAOUDI ◽  
D.E. MULLER ◽  
P.E. SCHUPP

We introduce four classes of Z-regular grammars for generating bi-infinite words (i.e. Z-words) and prove that they generate exactly Z-regular languages. We extend the second order monadic theory of one successor to the set of the integers (i.e. Z) and give some characterizations of this theory in terms of Z-regular grammars and Z-regular languages. We prove that this theory is decidable and equivalent to the weak theory. We also extend the linear temporal logic to Z-temporal logic and then prove that each Z-temporal formula is equivalent to a first order monadic formula. We prove that the correctness problem for finite state processes is decidable.


Author(s):  
C. M. Sperberg-McQueen

Tricolor automata are extensions of finite state automata, intended for the comparison of two regular languages; states and arcs in the automaton are colored to indicate whether they are peculiar to one language or the other, or common to both. Their design represents a simple application to practical purposes of ideas derived from the work of Glushkov and Brzozowski. Examples are given to show how tricolor automata can be used to visualize the intersection, union, and set difference of two languages, and algorithms for constructing them are given.


2016 ◽  
Vol 28 (1) ◽  
pp. 14-44 ◽  
Author(s):  
GUILLAUME BONFANTE ◽  
FLORIAN DELOUP

The paper defines and studies the genus of finite state deterministic automata (FSA) and regular languages. Indeed, an FSA can be seen as a graph for which the notion of genus arises. At the same time, an FSA has a semantics via its underlying language. It is then natural to make a connection between the languages and the notion of genus. After we introduce and justify the the notion of the genus for regular languages, the following questions are addressed. First, depending on the size of the alphabet, we provide upper and lower bounds on the genus of regular languages: we show that under a relatively generic condition on the alphabet and the geometry of the automata, the genus grows at least linearly in terms of the size of the automata. Second, we show that the topological cost of the powerset determinization procedure is exponential. Third, we prove that the notion of minimization is orthogonal to the notion of genus. Fourth, we build regular languages of arbitrary large genus: the notion of genus defines a proper hierarchy of regular languages.


2006 ◽  
Vol 17 (02) ◽  
pp. 379-393 ◽  
Author(s):  
YO-SUB HAN ◽  
YAJUN WANG ◽  
DERICK WOOD

We study infix-free regular languages. We observe the structural properties of finite-state automata for infix-free languages and develop a polynomial-time algorithm to determine infix-freeness of a regular language using state-pair graphs. We consider two cases: 1) A language is specified by a nondeterministic finite-state automaton and 2) a language is specified by a regular expression. Furthermore, we examine the prime infix-free decomposition of infix-free regular languages and design an algorithm for the infix-free primality test of an infix-free regular language. Moreover, we show that we can compute the prime infix-free decomposition in polynomial time. We also demonstrate that the prime infix-free decomposition is not unique.


2013 ◽  
Vol 2013 ◽  
pp. 1-7
Author(s):  
Guowu Yang ◽  
William N. N. Hung ◽  
Xiaoyu Song ◽  
Wensheng Guo

Generalized symbolic trajectory evaluation (GSTE) is a model checking approach and has successfully demonstrated its powerful capacity in formal verification of VLSI systems. GSTE is an extension of symbolic trajectory evaluation (STE) to the model checking ofω-regular properties. It is an alternative to classical model checking algorithms where properties are specified as finite-state automata. In GSTE, properties are specified as assertion graphs, which are labeled directed graphs where each edge is labeled with two labeling functions: antecedent and consequent. In this paper, we show the complement relation between GSTE assertion graphs and finite-state automata with the expressiveness of regular languages andω-regular languages. We present an algorithm that transforms a GSTE assertion graph to a finite-state automaton and vice versa. By applying this algorithm, we transform the problem of GSTE assertion graphs implication to the problem of automata language containment. We demonstrate our approach with its application to verification of an FIFO circuit.


2003 ◽  
Vol 9 (1) ◽  
pp. 65-85 ◽  
Author(s):  
NATHAN VAILLETTE

This paper describes how the use of monadic second-order logic for specifying regular languages can be extended for specifying regular relations, providing a declarative description language for finite state transductions of the sort used in NLP. We discuss issues arising in the integration into an automaton toolkit of an implementation of the conversion from logic formulas to automata. The utility of the logic of regular relations is demonstrated by showing how it can be used to define the family of replacement operators in a way that lends itself to straightforward proofs of correctness.


2020 ◽  
Author(s):  
Anil Shrestha ◽  
Niraj Maskey ◽  
Xiaohui Dong ◽  
Zongtai Zheng ◽  
Fuhan Yang ◽  
...  

Abstract Objective. To investigate recent trends in the epidemiological and prognostic factors of clear cell adenocarcinoma (CCA) which is considered a relatively rare tumor with a glycogen-rich phenotype. Methods. Patients with CCA from years 2000 to 2016 were identified from the Surveillance, Epidemiological, and End Results (SEER) database. Relevant population data were used to analyze the rates age-adjusted incidence, age-standardized 3-year and 5-year relative survivals, and overall survival (OS). Results. Of the 104,206 CCA patients identified. The age-adjusted incidence of CCA increased 2.7-fold from the year 2000 (3.3/100,000) to 2016 (8.8/100,000). This increase occurred across all ages, races, stages, and grades. Of all these subgroups, the increase was largest in the grade IV group. The age-standardized 3-year and 5-year relative survivals increased during this study period, rising by 9.1% and 9.5% from 2000 to 2011, respectively. Among all the stages and grades, the relative survival increase was greatest in the grade IV group. According to multivariate analysis of all CCA patients, predictors of OS were: age, gender, year of diagnosis, marital status, race, grade, stage, and primary tumor site (P < 0.001). The OS of all CCA patients during the period 2008 to 2016 was significantly higher than that from 2000 to 2007 (hazard ratio [HR], 0.87; 95% CI: 0.85–0.89; P < 0.001). Conclusions. The incidence of CCA and survival of these patients improved over time. In particular, the highest increases were reported for grade IV CCA, which may be due to an earlier diagnosis and improved treatment.


2005 ◽  
Vol 38 (9) ◽  
pp. 1431-1443 ◽  
Author(s):  
Francisco Casacuberta ◽  
Enrique Vidal ◽  
David Picó

Sign in / Sign up

Export Citation Format

Share Document