String Matching Problems from Bioinformatics Which Still Need Better Solutions

The string-matching paradigm is applied in every computer science and science branch in general. The existence of a plethora of string-matching algorithms makes it hard to choose the best one for any particular case. Expressing, measuring, and testing algorithm efficiency is a challenging task with many potential pitfalls. Algorithm efficiency can be measured based on the usage of different resources. In software engineering, algorithmic productivity is a property of an algorithm execution identified with the computational resources the algorithm consumes. Resource usage in algorithm execution could be determined, and for maximum efficiency, the goal is to minimize resource usage. Guided by the fact that standard measures of algorithm efficiency, such as execution time, directly depend on the number of executed actions. Without touching the problematics of computer power consumption or memory, which also depends on the algorithm type and the techniques used in algorithm development, we have developed a methodology which enables the researchers to choose an efficient algorithm for a specific domain. String searching algorithms efficiency is usually observed independently from the domain texts being searched. This research paper aims to present the idea that algorithm efficiency depends on the properties of searched string and properties of the texts being searched, accompanied by the theoretical analysis of the proposed approach. In the proposed methodology, algorithm efficiency is expressed through character comparison count metrics. The character comparison count metrics is a formal quantitative measure independent of algorithm implementation subtleties and computer platform differences. The model is developed for a particular problem domain by using appropriate domain data (patterns and texts) and provides for a specific domain the ranking of algorithms according to the patterns’ entropy. The proposed approach is limited to on-line exact string-matching problems based on information entropy for a search pattern. Meticulous empirical testing depicts the methodology implementation and purports soundness of the methodology.

Download Full-text

A New Algorithm for Subset Matching Problem Based on Set-String Transformation

Encyclopedia of Information Communication Technology ◽

10.4018/978-1-59904-845-1.ch080 ◽

2009 ◽

pp. 607-615

Author(s):

Yangjun Chen

Keyword(s):

Programming Languages ◽

Linear Time ◽

String Matching ◽

Special Problem ◽

Computer Engineering ◽

Data Types ◽

Matching Problem ◽

Matching Problems ◽

Abstract Data ◽

Geometric Pattern Matching

In computer engineering, a number of programming tasks involve a special problem, the so-called tree matching problem (Cole & Hariharan, 1997), as a crucial step, such as the design of interpreters for nonprocedural programming languages, automatic implementation of abstract data types, code optimization in compilers, symbolic computation, context searching in structure editors and automatic theorem proving. Recently, it has been shown that this problem can be transformed in linear time to another problem, the so called subset matching problem (Cole & Hariharan, 2002, 2003), which is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet S. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j - 1], for all j (1 = j = m). This is a generalization of the ordinary string matching and is of interest since an efficient algorithm for this problem implies an efficient solution to the tree matching problem. In addition, as shown in (Indyk, 1997), this problem can also be used to solve general string matching and counting matching (Muthukrishan, 1997; Muthukrishan & Palem, 1994), and enables us to design efficient algorithms for several geometric pattern matching problems. In this article, we propose a new algorithm on this issue, which needs only O(n + m) time in the case that the size of S is small and O(n + m·n0.5) time on average in general cases.

Download Full-text

Faster algorithms for string matching problems: matching the convolution bound

Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280) ◽

10.1109/sfcs.1998.743440 ◽

2002 ◽

Cited By ~ 19

Author(s):

P. Indyk

Keyword(s):

String Matching ◽

Matching Problems

Download Full-text

STRING MATCHING ARTIFICIAL NEURAL NETWORKS

International Journal of Neural Systems ◽

10.1142/s0129065701000874 ◽

2001 ◽

Vol 11 (05) ◽

pp. 445-453 ◽

Cited By ~ 2

Author(s):

TATIANA TAMBOURATZIS

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Computational Complexity ◽

Building Block ◽

String Matching ◽

Matching Problems ◽

Low Computational Complexity ◽

Artificial Neural ◽

Harmony Theory ◽

Fast Match

Three artificial neural networks (ANNs) are proposed for solving a variety of on- and off-line string matching problems. The ANN structure employed as the building block of these ANNs is derived from the harmony theory (HT) ANN, whereby the resulting string matching ANNs are characterized by fast match-mismatch decisions, low computational complexity, and activation values of the ANN output nodes that can be used as indicators of substitution, insertion (addition) and deletion spelling errors.

Download Full-text

String matching problems over free partially commutative monoids

Information and Computation ◽

10.1016/0890-5401(92)90060-s ◽

1992 ◽

Vol 101 (2) ◽

pp. 131-149 ◽

Cited By ~ 4

Author(s):

Kosaburo Hashiguchi ◽

Kazuya Yamada

Keyword(s):

String Matching ◽

Commutative Monoids ◽

Matching Problems

Download Full-text

On a parallel-algorithms method for string matching problems (overview)

Lecture Notes in Computer Science - Algorithms and Complexity ◽

10.1007/3-540-57811-0_3 ◽

1994 ◽

pp. 22-32 ◽

Cited By ~ 2

Author(s):

Suleyman Cenk Sahinalp ◽

Uzi Vishkin

Keyword(s):

Parallel Algorithms ◽

String Matching ◽

Matching Problems

Download Full-text

Some string matching problems from Bioinformatics which still need better solutions

Journal of Discrete Algorithms ◽

10.1016/s1570-8667(03)00062-5 ◽

2004 ◽

Vol 2 (1) ◽

pp. 3-15 ◽

Cited By ~ 3

Author(s):

Gaston H. Gonnet

Keyword(s):

String Matching ◽

Matching Problems

Download Full-text

Two recognizable string-matching problems over free partially commutative monoids

Theoretical Computer Science ◽

10.1016/0304-3975(92)90136-4 ◽

1992 ◽

Vol 92 (1) ◽

pp. 77-86 ◽

Cited By ~ 1

Author(s):

Kosaburo Hashiguchi ◽

Kazuya Yamada

Keyword(s):

String Matching ◽

Commutative Monoids ◽

Matching Problems

Download Full-text

A Fast Engine for Multi-String Pattern Matching

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500392 ◽

2017 ◽

Vol 31 (12) ◽

pp. 1750039 ◽

Cited By ~ 2

Author(s):

Zhan Peng ◽

Yuping Wang ◽

Wei Yue

Keyword(s):

Data Structure ◽

Pattern Matching ◽

Large Scale ◽

String Matching ◽

Memory Consumption ◽

Matching Problems ◽

Text String ◽

Tree Node ◽

String Pattern ◽

Better Than

Multi-string matching (MSM) is a core technique searching a text string for all occurrences of some string patterns. It is widely used in many applications. However, as the number of string patterns increases, most of the existing algorithms suffer from two issues: the long matching time, and the high memory consumption. To address these issues, in this paper, a fast matching engine is proposed for large-scale string matching problems. Our engine includes a filter module and a verification module. The filter module is based on several bitmaps which are responsible for quickly filtering out the invalid positions in the text, while for each potential matched position, the verification module confirms true pattern occurrence. In particular, we design a compact data structure called Adaptive Matching Tree (AMT) for the verification module, in which each tree node only saves some pattern fragments of the whole pattern set and the inner structure of each tree node is chosen adaptively according to the features of the corresponding pattern fragments. This makes the engine time and space efficient. The experiments indicate that, our matching engine performs better than the compared algorithms, especially for large pattern sets.

Download Full-text

Research on an Single Pattern Matching Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.4468 ◽

2012 ◽

Vol 433-440 ◽

pp. 4468-4474

Author(s):

Qiang Zheng

Keyword(s):

Pattern Matching ◽

High Performance ◽

String Matching ◽

Matching Algorithm ◽

Matching Problems ◽

Cross Border ◽

Protection Method ◽

Single Pattern ◽

Low Efficiency ◽

The Cost

The design of exact single pattern string matching algorithm with high performance is the basis of all string matching problems. To overcome the defects of low efficiency of pattern matching, this paper improves one of the fastest exact single pattern matching algorithms known on English text, which is SBNDM2。The simplest form of the BNDM core loop is obtained, in which there are only 5 instructions per-character read by amending the relationship between position in the pattern and bit in the bit mask. And a cross-border protection method is added to the algorithm in order to reduce the cost of cross-border inspection. Two algorithms named S2BNDM and S2BNDM′ are presented. The experimental results indicate that both S2BNDM and S2BNDM′are faster than SBNDM2 in any case.

Download Full-text