Approximate String Searching
Consider the string searching problem, where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern, or a superfluous character in the text, or a superfluous character in the pattern. Given a text of length n, a pattern of length m and an integer k, serial and parallel algorithms for finding all occurrences of the pattern in the text with at most k differences are presented. For completeness we also describe an efficient algorithm for preprocessing a rooted tree, so that queries requesting the lowest common ancestor of every pair of vertices in the tree can be processed quickly. Input form. Two arrays: A = a1., ...,am - the pattern, T = t1, ...,tn - the text and an integer k (≥ 1). In the present chapter we will be interested in finding all occurrences of the pattern string in the text string with at most k differences. Three types of differences are distinguished: (a) A character of the pattern corresponds to a different character of the text - a mismatch between the two characters. (Item 2 in Example 1, below.) (b) A character of the pattern corresponds to “no character” in the text. (Item 4). (c) A character of the text corresponds to “no character” in the pattern. (Item 6). Example 1. Let the text be abcdefghi , the pattern bxdyegh and k = 3. Let us see whether there is an occurrence with ≤ k differences that ends at the eighth location of the text. For this the following correspondence between bcdefgh and bxdyegh is proposed. 1. b (of the text) corresponds to b (of the pattern). 2. c to x. 3. d to d. 4. Nothing to y. 5. e to e. 6. f to nothing. 7. g to g. 8. h to h.