scholarly journals A Note on Linear Time Simulation of Deterministic Two-Way Pushdown Automata

1977 ◽  
Vol 6 (75) ◽  
Author(s):  
Neil D. Jones

<p>Cook has shown that any deterministic two-way pushdown automaton could be simulated by a uniform-cost random access machine in time O(n) for inputs of length n. The result was of interest because such a machine is a natural model for a variety of backtracking algorithms, particularly as used in pattern matching problems. The linear time result was surprising because of the fact that such machines may run as many as 2n steps before halting; similar problems with 'combinatorial explosions' are well known to occur in applications of backtracking. Cook's result inspired the development of a number of efficient pattern matching algorithms.</p><p>However, it is impractical to use Cook's algorithm directly to do pattern matching, since it involves a large constant time factor and much storage. The purpose of this note is to present an alternate, simpler simulation algorithm which involves consideration only of the configurations actually reached by the automaton. It can be expected to run faster and use less storage (depending on the data structures used), thus bringing Cook's result a step closer to practical utility.</p>

2012 ◽  
Vol 9 (3) ◽  
pp. 1125-1153
Author(s):  
J. Travnícek ◽  
J. Janousek ◽  
B. Melichar

Trees are one of the fundamental data structures used in Computer Science. We present a new kind of acyclic pushdown automata, the tree pattern pushdown automaton and the nonlinear tree pattern pushdown automaton, constructed for an ordered tree. These automata accept all tree patterns and nonlinear tree patterns, respectively, which match the tree and represent a full index of the tree for such patterns. Given a tree with n nodes, the numbers of these distinct tree patterns and nonlinear tree patterns can be at most 2n?1 +n and at most (2+v)n?1+2, respectively, where v is the maximal number of nonlinear variables allowed in nonlinear tree patterns. The total sizes of nondeterministic versions of the two pushdown automata are O(n) and O(n2), respectively. We discuss the time complexities and show timings of our implementations using the bit-parallelism technique. The timings show that for a given tree the running time is linear to the size of the input pattern.


10.14311/1113 ◽  
2009 ◽  
Vol 49 (2) ◽  
Author(s):  
T. Flouri

Tree pattern matching is an important operation in Computer Science on which a number of tasks such as mechanical theorem proving, term-rewriting, symbolic computation and non-procedural programming languages are based on. Work has begun on a systematic approach to the construction of tree pattern matchers by deterministic pushdown automata which read subject trees in prefix notation. The method is analogous to the construction of string pattern matchers: for given patterns, a non-deterministic pushdown automaton is created and then it is determinised. In this first paper, we present the proposed non-deterministic pushdown automaton which will serve as a basis for the determinisation process, and prove its correctness. 


2010 ◽  
Vol 7 (2) ◽  
pp. 331-357 ◽  
Author(s):  
Tomás Flouri ◽  
Jan Janousek ◽  
Bořivoj Melichar

Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix and postfix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and is then determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.


Author(s):  
R. Giancarlo ◽  
R. Grossi

We discuss the suffix tree generalization to matrices in this chapter. We extend the suffix tree notion (described in Chapter 3) from text strings to text matrices whose entries are taken from an ordered alphabet with the aim of solving pattern-matching problems. This suffix tree generalization can be efficiently used to implement low-level routines for Computer Vision, Data Compression, Geographic Information Systems and Visual Databases. We examine the submatrices in the form of the text’s contiguous parts that still have a matrix shape. Representing these text submatrices as “suitably formatted” strings stored in a compacted trie is the rationale behind suffix trees for matrices. The choice of the format inevitably influences suffix tree construction time and space complexity. We first deal with square matrices and show that many suffix tree families can be defined for the same input matrix according to the matrix’s string representations. We can store each suffix tree in linear space and give an efficient construction algorithm whose input is both the matrix and the string representation chosen. We then treat rectangular matrices and define their corresponding suffix trees by means of some general rules which we list formally. We show that there is a super-linear lower bound to the space required (in contrast with the linear space required by suffix trees for square matrices). We give a simple example of one of these suffix trees. The last part of the chapter illustrates some technical results regarding suffix trees for square matrices: we show how to achieve an expected linear-time suffix tree construction for a constant-size alphabet under some mild probabilistic assumptions about the input distribution. We begin by defining a wide class of string representations for square matrices. We let Σ denote an ordered alphabet of characters and introduce another alphabet of five special characters, called shapes. A shape is one of the special characters taken from set {IN,SW,NW,SE,NE}. Shape IN encodes the 1x1 matrix generated from the empty matrix by creating a square.


Author(s):  
Yangjun Chen

In computer engineering, a number of programming tasks involve a special problem, the so-called tree matching problem (Cole & Hariharan, 1997), as a crucial step, such as the design of interpreters for nonprocedural programming languages, automatic implementation of abstract data types, code optimization in compilers, symbolic computation, context searching in structure editors and automatic theorem proving. Recently, it has been shown that this problem can be transformed in linear time to another problem, the so called subset matching problem (Cole & Hariharan, 2002, 2003), which is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet S. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j - 1], for all j (1 = j = m). This is a generalization of the ordinary string matching and is of interest since an efficient algorithm for this problem implies an efficient solution to the tree matching problem. In addition, as shown in (Indyk, 1997), this problem can also be used to solve general string matching and counting matching (Muthukrishan, 1997; Muthukrishan & Palem, 1994), and enables us to design efficient algorithms for several geometric pattern matching problems. In this article, we propose a new algorithm on this issue, which needs only O(n + m) time in the case that the size of S is small and O(n + m·n0.5) time on average in general cases.


2017 ◽  
Vol 28 (07) ◽  
pp. 889-914
Author(s):  
Jie Lin ◽  
Yue Jiang ◽  
E. James Harner ◽  
Bing-Hua Jiang ◽  
Don Adjeroh

Let [Formula: see text] be a string, with symbols from an alphabet. [Formula: see text] is said to be degenerate if for some positions, say [Formula: see text], [Formula: see text] can contain a subset of symbols from the symbol alphabet, rather than just one symbol. Given a text string [Formula: see text] and a pattern [Formula: see text], both with symbols from an alphabet [Formula: see text], the degenerate string matching problem, is to find positions in [Formula: see text] where [Formula: see text] occured, such that [Formula: see text], [Formula: see text], or both are allowed to be degenerate. Though some algorithms have been proposed, their huge computational cost pose a significant challenge to their practical utilization. In this work, we propose IDPM, an improved degenerate pattern matching algorithm based on an extension of the Boyer–Moore algorithm. At the preprocessing phase, the algorithm defines an alphabet-independent compatibility rule, and computes the shift arrays using respective variants of the bad character and good suffix heuristics. At the search phase, IDPM improves the matching speed by using the compatibility rule. On average, the proposed IDPM algorithm has a linear time complexity with respect to the text size, and to the overall size of the pattern. IDPM demonstrates significance performance improvement over state-of-the-art approaches. It can be used in fast practical degenerate pattern matching with large data sizes, with important applications in flexible and scalable searching of huge biological sequences.


Sign in / Sign up

Export Citation Format

Share Document