A NOVEL ALGORITHM FOR SOLVING THE STRING MATCHING PROBLEM

Author(s):  
IBRAHIEM M. M. EL EMARY ◽  
MOHAMMED S. M. JABER

The string matching problem consists of finding one or more, generally all, exact occurrences of a pattern P in a text T. This paper presents a new algorithm for solving the string matching problem. Application of the proposed algorithm assists in improving the search process of a specific pattern in a certain unchangeable text through decreasing the number of character comparisons. Operation concept of such an algorithm depends on pattern reading to obtain the pattern length and the pattern first character and then a search is done in a table of two columns: the first column represents the word length in the text and the second one represents the start positions of each word classified by the same length. After that the algorithm just searches the words of the same length. Our experimental results depend mainly on comparing the performance of our algorithm with the well-known pattern matching algorithms such as Boyer–Moor's and Boyer–Moor–Galil's. The comparison between our algorithm and others are done in terms of the number of characters compared for different sizes of text. The output results show that our algorithm performs better than the others in terms of this parameter.

Author(s):  
Rajesh Prasad

Word matching problem is to find all the exact occurrences of a pattern P[0...m-1] in the text T[0...n-1], where P neither contains any white space nor preceded and followed by space. In the parameterized word matching problem, a given word P[0...m-1] is said to match with a sub-word t of the text T[0...n-1], if there exists a one-to-one correspondence between the symbols of P and the symbols of t. Exact Word Matching (EWM) problem has been previously solved by partitioning the text into number of tables in the pre-processing phase and then applying either brute force approach or fast hashing during the searching process. This paper presents an extension of EWM problem for parameterized word matching. It first split the text into number of tables in the pre-processing phase and then applying prev-encoding and bit-parallelism technique, Parameterized Shift-Or (PSO) during the searching phase. Experimental results show that this technique performs better than PSO.


2012 ◽  
Vol 239-240 ◽  
pp. 1437-1441 ◽  
Author(s):  
Zhen Liu ◽  
Yun An Hu

The paper proposed a novel compact genetic algorithm which is named as pseudo-parallel compact genetic algorithm. There are two populations in the process of evolution, and the two subpopulation can exchange information between each other. The experimental results show that the novel algorithm performs better than simple genetic algorithm. Then it is used to solve weapon target allocation (WTA) problem, and the simulation result shows that it is more efficient comparing with other methods. Because the compact genetic algorithm is easy to operate and take up less memory, so the algorithm exhibit a better quality of solution and the required less time than before.


Author(s):  
A. Amir ◽  
M. Farach

String matching is a basic theoretical problem in computer science, but has been useful in implementating various text editing tasks. The explosion of multimedia requires an appropriate generalization of string matching to higher dimensions. The first natural generalization is that of seeking the occurrences of a pattern in a text where both pattern arid text are rectangles. The last few years saw a tremendous activity in two dimensional pattern matching algorithms. We naturally had to limit the amount of information that entered this chapter. We chose to concentrate on serial deterministic algorithms for some of the basic issues of two dimensional matching. Throughout this chapter we define our problems in terms of squares rather than rectangles, however, all results presented easily generalize to rectangles. The Exact Two Dimensional Matching Problem is defined as follows: . . . INPUT: Text array T[n x n] and pattern array P[m x m]. OUTPUT: All locations [i,j] in T where there is an occurrence of P, i.e. T[i+k+,j+l] = P[k+1,l+1] 0 ≤ k, l ≤ n-1. . . . A natural way of solving any generalized problem is by reducing it to a special case whose solution is known. It is therefore not surprising that most solutions to the two dimensional exact matching problem use exact string matching algorithms in one way or another. In this section, we present an algorithm for two dimensional matching which relies on reducing a matrix of characters into a one dimensional array. Let P' [1 . . .m] be a pattern which is derived from P by setting P' [i] = P[i,l]P[i,2]…P[i,m], that is, the ith character of P' is the ith row of P. Let Ti[l . . .n — m + 1], for 1 ≤ i ≤ n, be a set of arrays such that Ti[j] = T[i, j] T [ i , j + 1 ] • • • T[i, j + m-1]. Clearly, P occurs at T[i, j] iff P' occurs at Ti[j].


2006 ◽  
Vol 17 (06) ◽  
pp. 1235-1251 ◽  
Author(s):  
DOMENICO CANTONE ◽  
SIMONE FARO

Finite (nondeterministic) automata are very useful building blocks in the field of string matching. This is particularly true in the case of multiple pattern matching, where the use of factor-based automata can reduce substantially the number of computational steps when the patterns have large common factors. Direct simulation of nondeterministic automata can be performed very efficiently using the bit-parallelism technique, though this is not necessarily true for factor-based automata. In this paper we present an algorithm for the multiple string matching problem, based on the bit-parallel simulation of nondeterministic factor-based automata which satisfy a particular ordering condition. We also show how to enforce such condition by suitably modifying a minimal initial automaton, through equivalence preserving transformations. The resulting automaton turns out to be smaller than the corresponding maximal automata used by existing bit-parallel algorithms, as they do not take any advantage of common factors in patterns.


Author(s):  
Zhan Peng ◽  
Yuping Wang ◽  
Wei Yue

Multi-string matching (MSM) is a core technique searching a text string for all occurrences of some string patterns. It is widely used in many applications. However, as the number of string patterns increases, most of the existing algorithms suffer from two issues: the long matching time, and the high memory consumption. To address these issues, in this paper, a fast matching engine is proposed for large-scale string matching problems. Our engine includes a filter module and a verification module. The filter module is based on several bitmaps which are responsible for quickly filtering out the invalid positions in the text, while for each potential matched position, the verification module confirms true pattern occurrence. In particular, we design a compact data structure called Adaptive Matching Tree (AMT) for the verification module, in which each tree node only saves some pattern fragments of the whole pattern set and the inner structure of each tree node is chosen adaptively according to the features of the corresponding pattern fragments. This makes the engine time and space efficient. The experiments indicate that, our matching engine performs better than the compared algorithms, especially for large pattern sets.


2001 ◽  
Vol 11 (01) ◽  
pp. 125-138 ◽  
Author(s):  
H. MONGELLI ◽  
S. W. SONG

Given a text and a pattern, the problem of pattern matching consists of determining all the positions of the text where the pattern occurs. When the text and the pattern are matrices, the matching is termed bidimensional. There are variations of this problem where we allow the matching using a somehow modified pattern. A modification that we will allow is that the pattern can be scaled. We propose a new parallel algorithm for this problem, under the CGM (Coarse Grained Multicomputer) model. This algorithm requires linear local computing time in the input, linear memory and uses only one communication round, during which at most a linear amount of data is exchanged. To be the best of our knowledge, there are no known parallel algorithms for the bidimensional pattern matching problem with scaling in the literature. This proposed algorithm was implemented in C, using the PVM interface and was executed on a Parsytec PowerXplorer parallel machine. The experimental results obtained were very promising and showed significant speedups.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1703-1710
Author(s):  
Xin Chai ◽  
Dan Yang ◽  
Jingyu Liu ◽  
Yan Li ◽  
Youxi Wu

Pattern mining has been widely applied in many fields. Users often mine a large number of patterns. However, most of these are difficult to apply in real applications. Top-k pattern mining, which involves finding the most frequent k patterns, is an effective strategy, because the more frequently a pattern occurs, the more likely they are to be important for users. However, top-k mining can only mine short patterns in mining applications with the Apriori property. It is well-known that short patterns contain less information than long patterns. In this paper, we focus on mining top-k sequence patterns of each pattern length. We propose an effective algorithm, named NOSTOPK (non-overlapping sequence pattern mining for top-k). The algorithm calculates the support of a pattern using a Nettree data structure, which has been introduced to tackle various types of pattern matching and sequence pattern mining issues. We find the top k patterns of length len, and calculate the supports of the corresponding k x |?| super-patterns of length len + 1 to discover the new top k super-patterns with len + 1. Experimental results demonstrate that the algorithm achieves a better performance than comparable algorithms.


2008 ◽  
Vol 19 (01) ◽  
pp. 163-183 ◽  
Author(s):  
KIMMO FREDRIKSSON ◽  
SZYMON GRABOWSKI

We propose new algorithms for (δ,γ,α)-matching. In this string matching problem we are given a pattern P = p0p1 … pm−1 and a text T = t0t1 … tn−1 over some integer alphabet Σ = {0…σ − 1}. The pattern symbol pi δ-matches the text symbol tj iff |pi − tj| ≤ δ. The pattern P (δ,γ)-matches some text substring tj … tj+m−1 iff for all i it holds that |pi − tj+i| ≤ δ and Σ |pi − tj+i| ≤ γ. Finally, in (δ,γ,α)-matching we also permit at most α-symbol gaps between each matching text symbol. The only known previous algorithm runs in O(nm) time. We give several algorithms that improve the average case up to O(n) for small α, and the worst case to [Formula: see text] or O(nm log (γ)/w), where [Formula: see text] and w is the number of bits in a machine word. The proposed algorithms can be easily modified to solve several other related problems, we explicitly consider e.g. character classes (instead of δ-matching), (Δ-limited) k-mismatches (instead of γ-matching) and more general gaps, including negative ones. These find important applications in computational biology. We conclude with experimental results showing that the algorithms are very efficient in practice.


2015 ◽  
Vol 27 (2) ◽  
pp. 143-156 ◽  
Author(s):  
TANVER ATHAR ◽  
CARL BARTON ◽  
WIDMER BLAND ◽  
JIA GAO ◽  
COSTAS S. ILIOPOULOS ◽  
...  

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.


2020 ◽  
Vol 2020 (15) ◽  
pp. 350-1-350-10
Author(s):  
Yin Wang ◽  
Baekdu Choi ◽  
Davi He ◽  
Zillion Lin ◽  
George Chiu ◽  
...  

In this paper, we will introduce a novel low-cost, small size, portable nail printer. The usage of this system is to print any desired pattern on a finger nail in just a few minutes. The detailed pre-processing procedures will be described in this paper. These include image processing to find the correct printing zone, and color management to match the patterns’ color. In each phase, a novel algorithm will be introduced to refine the result. The paper will state the mathematical principles behind each phase, and show the experimental results, which illustrate the algorithms’ capabilities to handle the task.


Sign in / Sign up

Export Citation Format

Share Document