IDPM: An Improved Degenerate Pattern Matching Algorithm for Biological Sequences

2017 ◽  
Vol 28 (07) ◽  
pp. 889-914
Author(s):  
Jie Lin ◽  
Yue Jiang ◽  
E. James Harner ◽  
Bing-Hua Jiang ◽  
Don Adjeroh

Let [Formula: see text] be a string, with symbols from an alphabet. [Formula: see text] is said to be degenerate if for some positions, say [Formula: see text], [Formula: see text] can contain a subset of symbols from the symbol alphabet, rather than just one symbol. Given a text string [Formula: see text] and a pattern [Formula: see text], both with symbols from an alphabet [Formula: see text], the degenerate string matching problem, is to find positions in [Formula: see text] where [Formula: see text] occured, such that [Formula: see text], [Formula: see text], or both are allowed to be degenerate. Though some algorithms have been proposed, their huge computational cost pose a significant challenge to their practical utilization. In this work, we propose IDPM, an improved degenerate pattern matching algorithm based on an extension of the Boyer–Moore algorithm. At the preprocessing phase, the algorithm defines an alphabet-independent compatibility rule, and computes the shift arrays using respective variants of the bad character and good suffix heuristics. At the search phase, IDPM improves the matching speed by using the compatibility rule. On average, the proposed IDPM algorithm has a linear time complexity with respect to the text size, and to the overall size of the pattern. IDPM demonstrates significance performance improvement over state-of-the-art approaches. It can be used in fast practical degenerate pattern matching with large data sizes, with important applications in flexible and scalable searching of huge biological sequences.

2012 ◽  
Vol 2012 ◽  
pp. 1-12 ◽  
Author(s):  
Xiaoyun Wang ◽  
Xianquan Zhang

Point pattern matching is an important topic of computer vision and pattern recognition. In this paper, we propose a point pattern matching algorithm for two planar point sets under Euclidean transform. We view a point set as a complete graph, establish the relation between the point set and the complete graph, and solve the point pattern matching problem by finding congruent complete graphs. Experiments are conducted to show the effectiveness and robustness of the proposed algorithm.


2005 ◽  
Vol 16 (06) ◽  
pp. 1155-1166
Author(s):  
SHUNSUKE INENAGA ◽  
AYUMI SHINOHARA ◽  
MASAYUKI TAKEDA

We study the fully compressed pattern matching problem (FCPM problem): Given [Formula: see text] and [Formula: see text] which are descriptions of text T and pattern P respectively, find the occurrences of P in Twithout decompressing[Formula: see text]or[Formula: see text]. This problem is rather challenging since patterns are also given in a compressed form. In this paper we present an FCPM algorithm for simple collage systems. Collage systems are a general framework representing various kinds of dictionary-based compressions in a uniform way, and simple collage systems are a subclass that includes LZW and LZ78 compressions. Collage systems are of the form [Formula: see text], where [Formula: see text] is a dictionary and [Formula: see text] is a sequence of variables from [Formula: see text]. Our FCPM algorithm performs in [Formula: see text] time, where [Formula: see text] and [Formula: see text]. This is faster than the previous best result of O(m2n2) time.


2013 ◽  
Vol 411-414 ◽  
pp. 1594-1597
Author(s):  
Mo Jia ◽  
Mei Chen ◽  
Hui Li

In this paper, we study how to improve the efficiency of classic AC matching algorithm for scanning systems, and proposed an algorithm named AC-SUN based on Sunday matching algorithm. In our design, AC-SUN combines jumps strengths of Sunday algorithm and AC algorithm to avoid verbatim matching problem, meanwhile it retaining the advantages of the AC algorithm that establishes state tree and invalid pointer to increasing successful matching percentage. Our experimental result also verified its superiority to the classic AC algorithm.


2014 ◽  
pp. 85-90
Author(s):  
Vladimir A. Oleshchuk

We propose to use pattern matching on data streams from sensors in order to monitor and detect events of interest. We study a privacy preserving pattern matching problem where patterns are specified as sequences of constraints on input elements. We propose a new privacy preserving pattern matching algorithm over an infinite alphabet A where a pattern P is given as a sequence { pi , pi ,..., pim } 1 2 of predicates pi j defined on A . The algorithm addresses the following problem: given a pattern P and an input sequence t, find privately all positions i in t where P matches t. The privacy preserving in the context of this paper means that sensor measurements will be evaluated as predicates ( ) pi ej privately, that is, sensors will not need to disclose the measurements ( ) ( ) ( j )


Sign in / Sign up

Export Citation Format

Share Document