Approximate String Searching

Author(s):  
G.M. Landau ◽  
U. Vishkin

Consider the string searching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern, or a superfluous character in the text, or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, serial and parallel algorithms for finding all occurrences of the pattern in the text with at most k differences are presented. For completeness we also describe an efficient algorithm for preprocessing a rooted tree, so that queries requesting the lowest common ancestor of every pair of vertices in the tree can be processed quickly. Input form. Two arrays: A = a1., ...,am - the pattern, T = t1, ...,tn - the text and an integer k (≥ 1). In the present chapter we will be interested in finding all occurrences of the pattern string in the text string with at most k differences. Three types of differences are distinguished: (a) A character of the pattern corresponds to a different character of the text - a mismatch between the two characters. (Item 2 in Example 1, below.) (b) A character of the pattern corresponds to “no character” in the text. (Item 4). (c) A character of the text corresponds to “no character” in the pattern. (Item 6). Example 1. Let the text be abcdefghi , the pattern bxdyegh and k = 3. Let us see whether there is an occurrence with ≤ k differences that ends at the eighth location of the text. For this the following correspondence between bcdefgh and bxdyegh is proposed. 1. b (of the text) corresponds to b (of the pattern). 2. c to x. 3. d to d. 4. Nothing to y. 5. e to e. 6. f to nothing. 7. g to g. 8. h to h.

10.37236/409 ◽  
2010 ◽  
Vol 17 (1) ◽  
Author(s):  
Markus Kuba ◽  
Stephan Wagner

By a theorem of Dobrow and Smythe, the depth of the $k$th node in very simple families of increasing trees (which includes, among others, binary increasing trees, recursive trees and plane ordered recursive trees) follows the same distribution as the number of edges of the form $j-(j+1)$ with $j < k$. In this short note, we present a simple bijective proof of this fact, which also shows that the result actually holds within a wider class of increasing trees. We also discuss some related results that follow from the bijection as well as a possible generalization. Finally, we use another similar bijection to determine the distribution of the depth of the lowest common ancestor of two nodes.


1993 ◽  
Vol 9 (5) ◽  
pp. 541-545
Author(s):  
Prunella Nicola ◽  
Sabino Liuni ◽  
Marcella Attimonelli ◽  
Graziano Pasole

1999 ◽  
Vol 10 (04) ◽  
pp. 375-389
Author(s):  
H. MONGELLI ◽  
S. W. SONG

Given an array of n real numbers A=(a0, a1, …, an-1), define MIN(i,j)= min {ai,…,aj}. The range minima problem consists of preprocessing array A such that queries MIN(i,j), for any 0≤i≤n-1 can be answered in constant time. Range minima is a basic problem that appears in many other important graph problems such as lowest common ancestor, Euler tour, etc. In this work we present a parallel algorithm under the CGM model (coarse grained multicomputer), that solves the range minima problem in O(n/p) time and constant number of communication rounds. The communication overhead involves the transmission of p numbers (independent of n). We show promising experimental results with speedup curves approximating the optimal for large n.


2013 ◽  
Vol 513 ◽  
pp. 25-37 ◽  
Author(s):  
Santanu Kumar Dash ◽  
Sven-Bodo Scholz ◽  
Stephan Herhut ◽  
Bruce Christianson

1999 ◽  
Vol 119 (1-2) ◽  
pp. 125-130 ◽  
Author(s):  
Biing-Feng Wang ◽  
Jiunn-Nan Tsai ◽  
Yuan-Cheng Chuang

Author(s):  
Tao Zhang ◽  
Qunfu Wu ◽  
Zhigang Zhang

AbstractTo explore potential intermediate host of a novel coronavirus is vital to rapidly control continuous COVID-19 spread. We found genomic and evolutionary evidences of the occurrence of 2019-nCoV-like coronavirus (named as Pangolin-CoV) from dead Malayan Pangolins. Pangolin-CoV is 91.02% and 90.55% identical at the whole genome level to 2019-nCoV and BatCoV RaTG13, respectively. Pangolin-CoV is the lowest common ancestor of 2019-nCoV and RaTG13. The S1 protein of Pangolin-CoV is much more closely related to 2019-nCoV than RaTG13. Five key amino-acid residues involved in the interaction with human ACE2 are completely consistent between Pangolin-CoV and 2019-nCoV but four amino-acid mutations occur in RaTG13. It indicates Pangolin-CoV has similar pathogenic potential to 2019-nCoV, and would be helpful to trace the origin and probable intermediate host of 2019-nCoV.


Sign in / Sign up

Export Citation Format

Share Document