A SPACE EFFICIENT BIT-PARALLEL ALGORITHM FOR THE MULTIPLE STRING MATCHING PROBLEM

2006 ◽  
Vol 17 (06) ◽  
pp. 1235-1251 ◽  
Author(s):  
DOMENICO CANTONE ◽  
SIMONE FARO

Finite (nondeterministic) automata are very useful building blocks in the field of string matching. This is particularly true in the case of multiple pattern matching, where the use of factor-based automata can reduce substantially the number of computational steps when the patterns have large common factors. Direct simulation of nondeterministic automata can be performed very efficiently using the bit-parallelism technique, though this is not necessarily true for factor-based automata. In this paper we present an algorithm for the multiple string matching problem, based on the bit-parallel simulation of nondeterministic factor-based automata which satisfy a particular ordering condition. We also show how to enforce such condition by suitably modifying a minimal initial automaton, through equivalence preserving transformations. The resulting automaton turns out to be smaller than the corresponding maximal automata used by existing bit-parallel algorithms, as they do not take any advantage of common factors in patterns.

2001 ◽  
Vol 11 (01) ◽  
pp. 125-138 ◽  
Author(s):  
H. MONGELLI ◽  
S. W. SONG

Given a text and a pattern, the problem of pattern matching consists of determining all the positions of the text where the pattern occurs. When the text and the pattern are matrices, the matching is termed bidimensional. There are variations of this problem where we allow the matching using a somehow modified pattern. A modification that we will allow is that the pattern can be scaled. We propose a new parallel algorithm for this problem, under the CGM (Coarse Grained Multicomputer) model. This algorithm requires linear local computing time in the input, linear memory and uses only one communication round, during which at most a linear amount of data is exchanged. To be the best of our knowledge, there are no known parallel algorithms for the bidimensional pattern matching problem with scaling in the literature. This proposed algorithm was implemented in C, using the PVM interface and was executed on a Parsytec PowerXplorer parallel machine. The experimental results obtained were very promising and showed significant speedups.


2019 ◽  
Vol 46 (4) ◽  
pp. 299-307
Author(s):  
Jihyo Choi ◽  
Youngho Kim ◽  
Joong Chae Na ◽  
Jeong Seop Sim

Author(s):  
A. Amir ◽  
M. Farach

String matching is a basic theoretical problem in computer science, but has been useful in implementating various text editing tasks. The explosion of multimedia requires an appropriate generalization of string matching to higher dimensions. The first natural generalization is that of seeking the occurrences of a pattern in a text where both pattern arid text are rectangles. The last few years saw a tremendous activity in two dimensional pattern matching algorithms. We naturally had to limit the amount of information that entered this chapter. We chose to concentrate on serial deterministic algorithms for some of the basic issues of two dimensional matching. Throughout this chapter we define our problems in terms of squares rather than rectangles, however, all results presented easily generalize to rectangles. The Exact Two Dimensional Matching Problem is defined as follows: . . . INPUT: Text array T[n x n] and pattern array P[m x m]. OUTPUT: All locations [i,j] in T where there is an occurrence of P, i.e. T[i+k+,j+l] = P[k+1,l+1] 0 ≤ k, l ≤ n-1. . . . A natural way of solving any generalized problem is by reducing it to a special case whose solution is known. It is therefore not surprising that most solutions to the two dimensional exact matching problem use exact string matching algorithms in one way or another. In this section, we present an algorithm for two dimensional matching which relies on reducing a matrix of characters into a one dimensional array. Let P' [1 . . .m] be a pattern which is derived from P by setting P' [i] = P[i,l]P[i,2]…P[i,m], that is, the ith character of P' is the ith row of P. Let Ti[l . . .n — m + 1], for 1 ≤ i ≤ n, be a set of arrays such that Ti[j] = T[i, j] T [ i , j + 1 ] • • • T[i, j + m-1]. Clearly, P occurs at T[i, j] iff P' occurs at Ti[j].


Author(s):  
IBRAHIEM M. M. EL EMARY ◽  
MOHAMMED S. M. JABER

The string matching problem consists of finding one or more, generally all, exact occurrences of a pattern P in a text T. This paper presents a new algorithm for solving the string matching problem. Application of the proposed algorithm assists in improving the search process of a specific pattern in a certain unchangeable text through decreasing the number of character comparisons. Operation concept of such an algorithm depends on pattern reading to obtain the pattern length and the pattern first character and then a search is done in a table of two columns: the first column represents the word length in the text and the second one represents the start positions of each word classified by the same length. After that the algorithm just searches the words of the same length. Our experimental results depend mainly on comparing the performance of our algorithm with the well-known pattern matching algorithms such as Boyer–Moor's and Boyer–Moor–Galil's. The comparison between our algorithm and others are done in terms of the number of characters compared for different sizes of text. The output results show that our algorithm performs better than the others in terms of this parameter.


Author(s):  
Robert Susik

We consider the application of multiple pattern matching (Multi AOSO on q-Grams) algorithm for approximate pattern matching. We propose the on-line approach which translates the problem from approximate pattern matching into a multiple pattern one (called partitioning into exact search). Presented solution allows relatively fast search multiple patterns in text with given k-differences(or mismatches). This paper presents comparison of solution based on MAG algorithm, and [4]. Experiments on DNA, English, Proteins and XML texts with up to k errors show that the new proposed algorithm achieves relatively good results in practical use.


Author(s):  
Z. Galil ◽  
I. Yudkiewicz

The string matching problem is defined as follows: given a string P0 ... Pm-1 called the pattern and a string T0 .. .Tn-1 called the text find all occurrences of the pattern in the text. The output of a string matching algorithm is a boolean array MATCH[0..n — 1] which contains a true value at each position where an occurrence of the pattern starts. Many sequential algorithms are known that solve this problem optimally, i.e., in a linear O(n) number of operations, most notable of which are the algorithms by Knuth, Morris and Pratt and by Boyer and Moore. In this chapter we limit ourselves to parallel algorithms. All algorithms considered in this chapter are for the parallel random access machine (PRAM) computation model. In the design of parallel algorithms for the various PRAM models, one tries to optimize two factors simultaneously: the number of processors used and the time required by the algorithm. The total number of operations performed, which is the time-processors product, is the measure of optimality. A parallel algorithm is called optimal if it needs the same number of operations as the fastest sequential algorithm. Hence, in the string matching problem, an algorithm is optimal if its time-processor product is linear in the length of the input strings. Apart from having an optimal algorithm the designer wishes the algorithm to be the fastest possible, where the only limit on the number of processors is the one caused by the time-processor product. The following fundamental lemma given by Brent is essential for understanding the tradeoff between time and processors : Any PRAM algoriihm of time t that consists of x elementary operations can be implemented on p processors in O(x/p + t) time. Using Brent’s lemma, any algorithm that uses a large number x of processors to run very fast can be implemented on p < x processors, with the same total work, however with an increase in time as described. A basic problem in the study of parallel algorithms for strings and arrays is finding the maximal/minimal position in an array that holds a certain value.


Sign in / Sign up

Export Citation Format

Share Document