On the relationship between histogram indexing and block-mass indexing

Histogram indexing , also known as jumbled pattern indexing and permutation indexing is one of the important current open problems in pattern matching. It was introduced about 6 years ago and has seen active research since. Yet, to date there is no algorithm that can preprocess a text T in time o (| T | 2 /polylog| T |) and achieve histogram indexing, even over a binary alphabet, in time independent of the text length. The pattern matching version of this problem has a simple linear-time solution. Block-mass pattern matching problem is a recently introduced problem, motivated by issues in mass-spectrometry. It is also an example of a pattern matching problem that has an efficient, almost linear-time solution but whose indexing version is daunting. However, for fixed finite alphabets, there has been progress made. In this paper, a strong connection between the histogram indexing problem and the block-mass pattern indexing problem is shown. The reduction we show between the two problems is amazingly simple. Its value lies in recognizing the connection between these two apparently disparate problems, rather than the complexity of the reduction. In addition, we show that for both these problems, even over unbounded alphabets, there are algorithms that preprocess a text T in time o (| T | 2 /polylog| T |) and enable answering indexing queries in time polynomial in the query length. The contributions of this paper are twofold: (i) we introduce the idea of allowing a trade-off between the preprocessing time and query time of various indexing problems that have been stumbling blocks in the literature. (ii) We take the first step in introducing a class of indexing problems that, we believe, cannot be pre-processed in time o (| T | 2 /polylog| T |) and enable linear-time query processing.

Download Full-text

IDPM: An Improved Degenerate Pattern Matching Algorithm for Biological Sequences

International Journal of Foundations of Computer Science ◽

10.1142/s0129054117500307 ◽

2017 ◽

Vol 28 (07) ◽

pp. 889-914

Author(s):

Jie Lin ◽

Yue Jiang ◽

E. James Harner ◽

Bing-Hua Jiang ◽

Don Adjeroh

Keyword(s):

Performance Improvement ◽

Pattern Matching ◽

Linear Time ◽

Computational Cost ◽

Large Data ◽

Biological Sequences ◽

Matching Problem ◽

Practical Utilization ◽

Matching Algorithm ◽

Pattern Matching Algorithm

Let [Formula: see text] be a string, with symbols from an alphabet. [Formula: see text] is said to be degenerate if for some positions, say [Formula: see text], [Formula: see text] can contain a subset of symbols from the symbol alphabet, rather than just one symbol. Given a text string [Formula: see text] and a pattern [Formula: see text], both with symbols from an alphabet [Formula: see text], the degenerate string matching problem, is to find positions in [Formula: see text] where [Formula: see text] occured, such that [Formula: see text], [Formula: see text], or both are allowed to be degenerate. Though some algorithms have been proposed, their huge computational cost pose a significant challenge to their practical utilization. In this work, we propose IDPM, an improved degenerate pattern matching algorithm based on an extension of the Boyer–Moore algorithm. At the preprocessing phase, the algorithm defines an alphabet-independent compatibility rule, and computes the shift arrays using respective variants of the bad character and good suffix heuristics. At the search phase, IDPM improves the matching speed by using the compatibility rule. On average, the proposed IDPM algorithm has a linear time complexity with respect to the text size, and to the overall size of the pattern. IDPM demonstrates significance performance improvement over state-of-the-art approaches. It can be used in fast practical degenerate pattern matching with large data sizes, with important applications in flexible and scalable searching of huge biological sequences.

Download Full-text

Permutation Pattern matching in (213, 231)-avoiding permutations

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.1329 ◽

2017 ◽

Vol Vol. 18 no. 2, Permutation... (Permutation Patterns) ◽

Author(s):

Both Neou ◽

Romeo Rizzi ◽

Stéphane Vialette

Keyword(s):

Pattern Matching ◽

Linear Approximate Pattern Matching Algorithm

10.21203/rs.3.rs-1021063/v1 ◽

2021 ◽

Author(s):

Anas Al-okaily ◽

Abdelghani Tbakhi

Keyword(s):

Pattern Matching ◽

Linear Time ◽

Search Costs ◽

Exact Matching ◽

Time And Space ◽

Matching Problem ◽

Approximate Matching ◽

Large Length ◽

Reference Stream ◽

Inexact Matching

Abstract Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can be as an exact or as an approximate (inexact) matching. Exact matching is to search for the pattern without allowing for mismatches (or insertions and deletions) of one or more characters in the pattern), while approximate matching is the opposite. For exact matching, several data structures that can be built in linear time and space are used and in practice nowadays. For approximate matching, the solutions proposed to solve this matching are non-linear and currently impractical. In this paper, we designed and implemented a structure that can be built in linear time and space and solve the approximate matching problem in (O(m + {log_Σ^k}n/{k!} + occ) search costs, where m is the length of the pattern, n is the length of the reference, and k is the number of tolerated mismatches (and insertion and deletions).

Download Full-text

A Linear-Time Solution for All-SAT Problem Based on P System

Chinese Journal of Electronics ◽

10.1049/cje.2018.01.008 ◽

2018 ◽

Vol 27 (2) ◽

pp. 367-373 ◽

Cited By ~ 5

Author(s):

Ping GUO ◽

Jian ZHU ◽

Haizhu CHEN ◽

Ruilong YANG

Keyword(s):

Linear Time ◽

P System ◽

Sat Problem ◽

Time Solution

Download Full-text

A fast expected time algorithm for the 2-D point pattern matching problem

Pattern Recognition ◽

10.1016/j.patcog.2003.12.009 ◽

2004 ◽

Vol 37 (8) ◽

pp. 1699-1711 ◽

Cited By ~ 38

Author(s):

P.B. Van Wamelen ◽

Z. Li ◽

S.S. Iyengar

Keyword(s):

Pattern Matching ◽

Time Algorithm ◽

Point Pattern ◽

Matching Problem ◽

Point Pattern Matching ◽

Expected Time

Download Full-text

Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.06.01 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1-20

Author(s):

Ahmed Fahim ◽

Keyword(s):

Data Clustering ◽

Linear Time ◽

Original Data ◽

Local Minima ◽

Expected Number ◽

Open Problems ◽

Number Of Clusters ◽

Benchmark Datasets ◽

Selection Of

The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.

Download Full-text

Representation and Reasoning about Strategic Abilities with ω-Regular Properties

Mathematics ◽

10.3390/math9233052 ◽

2021 ◽

Vol 9 (23) ◽

pp. 3052

Author(s):

Liping Xiong ◽

Sumei Guo

Keyword(s):

Temporal Logic ◽

Linear Time ◽

Dynamic Logic ◽

Research Area ◽

Regular Expressions ◽

Multi Agent Systems ◽

Practical Applications ◽

Multi Agent ◽

Strategy Logic ◽

Active Research

Specification and verification of coalitional strategic abilities have been an active research area in multi-agent systems, artificial intelligence, and game theory. Recently, many strategic logics, e.g., Strategy Logic (SL) and alternating-time temporal logic (ATL*), have been proposed based on classical temporal logics, e.g., linear-time temporal logic (LTL) and computational tree logic (CTL*), respectively. However, these logics cannot express general ω-regular properties, the need for which are considered compelling from practical applications, especially in industry. To remedy this problem, in this paper, based on linear dynamic logic (LDL), proposed by Moshe Y. Vardi, we propose LDL-based Strategy Logic (LDL-SL). Interpreted on concurrent game structures, LDL-SL extends SL, which contains existential/universal quantification operators about regular expressions. Here we adopt a branching-time version. This logic can express general ω-regular properties and describe more programmed constraints about individual/group strategies. Then we study three types of fragments (i.e., one-goal, ATL-like, star-free) of LDL-SL. Furthermore, we show that prevalent strategic logics based on LTL/CTL*, such as SL/ATL*, are exactly equivalent with those corresponding star-free strategic logics, where only star-free regular expressions are considered. Moreover, results show that reasoning complexity about the model-checking problems for these new logics, including one-goal and ATL-like fragments, is not harder than those of corresponding SL or ATL*.

Download Full-text