On a parallel-algorithms method for string matching problems (overview)

Author(s):  
Suleyman Cenk Sahinalp ◽  
Uzi Vishkin
Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 31
Author(s):  
Ivan Markić ◽  
Maja Štula ◽  
Marija Zorić ◽  
Darko Stipaničev

The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns’ entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.


Author(s):  
Yangjun Chen

In computer engineering, a number of programming tasks involve a special problem, the so-called tree matching problem (Cole & Hariharan, 1997), as a crucial step, such as the design of interpreters for nonprocedural programming languages, automatic implementation of abstract data types, code optimization in compilers, symbolic computation, context searching in structure editors and automatic theorem proving. Recently, it has been shown that this problem can be transformed in linear time to another problem, the so called subset matching problem (Cole & Hariharan, 2002, 2003), which is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet S. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j - 1], for all j (1 = j = m). This is a generalization of the ordinary string matching and is of interest since an efficient algorithm for this problem implies an efficient solution to the tree matching problem. In addition, as shown in (Indyk, 1997), this problem can also be used to solve general string matching and counting matching (Muthukrishan, 1997; Muthukrishan & Palem, 1994), and enables us to design efficient algorithms for several geometric pattern matching problems. In this article, we propose a new algorithm on this issue, which needs only O(n + m) time in the case that the size of S is small and O(n + m·n0.5) time on average in general cases.


2013 ◽  
Vol 57 (5) ◽  
pp. 731-743 ◽  
Author(s):  
K.-H. Chen ◽  
G.-S. Huang ◽  
R. C.-T. Lee

2001 ◽  
Vol 11 (05) ◽  
pp. 445-453 ◽  
Author(s):  
TATIANA TAMBOURATZIS

Three artificial neural networks (ANNs) are proposed for solving a variety of on- and off-line string matching problems. The ANN structure employed as the building block of these ANNs is derived from the harmony theory (HT) ANN, whereby the resulting string matching ANNs are characterized by fast match-mismatch decisions, low computational complexity, and activation values of the ANN output nodes that can be used as indicators of substitution, insertion (addition) and deletion spelling errors.


1992 ◽  
Vol 101 (2) ◽  
pp. 131-149 ◽  
Author(s):  
Kosaburo Hashiguchi ◽  
Kazuya Yamada

2011 ◽  
Vol 37 (12) ◽  
pp. 820-845 ◽  
Author(s):  
Johannes Langguth ◽  
Md. Mostofa Ali Patwary ◽  
Fredrik Manne

Sign in / Sign up

Export Citation Format

Share Document