scholarly journals Weak factor automata: the failure of failure factor oracles?

2014 ◽  
Vol 53 ◽  
Author(s):  
Loek Cleophas ◽  
Derrick G. Kourie ◽  
Bruce W. Watson

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.

2017 ◽  
Vol 80 ◽  
pp. 162-170 ◽  
Author(s):  
Muhammad Tahir ◽  
Muhammad Sardaraz ◽  
Ataul Aziz Ikram

2012 ◽  
Vol 45 (2) ◽  
pp. 332-334 ◽  
Author(s):  
R. Nagarajan ◽  
S. Siva Balan ◽  
R. Sabarinathan ◽  
M. Kirti Vaishnavi ◽  
K. Sekar

Fragment Finder 2.0is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone Cα angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs,STAMPandPROFIT, provided in the server. The freely available Java plug-inJmolhas been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.


2012 ◽  
Vol 532-533 ◽  
pp. 1414-1418 ◽  
Author(s):  
Feng Du

In this paper, a Faster algorithm: BMF is proposed, which sets improvements in the time complexity of the BM algorithm. The BMF algorithm defines a new pre-calculation function to increase in the skips of pattern significantly. Experiments indicate that the time complexity is reduced by 63% at most. Therefore, the improved algorithm could provide significant improvement in pattern matching performance when using in an IDS.


2015 ◽  
Vol 126 (8) ◽  
pp. e72-e73
Author(s):  
H.-P. Müller ◽  
G. Grön ◽  
S. Abrahams ◽  
P. Bede ◽  
M. Filippi ◽  
...  

2013 ◽  
Vol 336-338 ◽  
pp. 2419-2422
Author(s):  
Li Hua Zhou

The efficiency of pattern matching algorithm used in detection engine decides the performance of intrusion detection system. This paper improves the data structure of SBOM Algorithm, which is well-known keyword matching algorithm, by adding or removing keywords dynamically. The results of experiments on 1999 DARPA intrusion Detection Evaluation Data Sets indicate that the implemented NIDS(Network Intrusion Detection System) is comparatively excellent for large keyword sets.


2007 ◽  
Vol 23 (6) ◽  
pp. 680-686 ◽  
Author(s):  
J. Herisson ◽  
G. Payen ◽  
R. Gherbi

In this paper a new searching technique based on pattern matching is proposed. In bioinformatics, finding Tandem Repeats (TR) in DNA sequences is an critical issue. There exist many pattern matching algorithms and KMP (Knuth Morris Pratt) is one of the pattern matching algorithm that undergo deficiencies of runtime complexity and cost when size of the data set increases. The main aim of the paper is to generate an effective algorithm for detecting and identifying Tandem Repeats over a DNA sequence more efficiently. By introducing the concept of 2Dimensional matrix to minimize the purview scope and optimizing the problem, Tandem Repeat finding algorithm makes the detecting or identifying process more efficient and effective that improves the quality of results. The theoretical analysis and experimental results concludes that tandem repeat finding algorithm get equivalent results in less runtime. This algorithm is better to KMP for determining results, and it also reduces or weaken the runtime cost which is beneficial when DNA data becomes greater


2017 ◽  
Vol 5 (1) ◽  
pp. 8-15
Author(s):  
Sergii Hilgurt ◽  

The multi-pattern matching is a fundamental technique found in applications like a network intrusion detection system, anti-virus, anti-worms and other signature- based information security tools. Due to rising traffic rates, increasing number and sophistication of attacks and the collapse of Moore’s law, traditional software solutions can no longer keep up. Therefore, hardware approaches are frequently being used by developers to accelerate pattern matching. Reconfigurable FPGA-based devices, providing the flexibility of software and the near-ASIC performance, have become increasingly popular for this purpose. Hence, increasing the efficiency of reconfigurable information security tools is a scientific issue now. Many different approaches to constructing hardware matching circuits on FPGAs are known. The most widely used of them are based on discrete comparators, hash-functions and finite automata. Each approach possesses its own pros and cons. None of them still became the leading one. In this paper, a method to combine several different approaches to enforce their advantages has been developed. An analytical technique to quickly advance estimate the resource costs of each matching scheme without need to compile FPGA project has been proposed. It allows to apply optimization procedures to near-optimally split the set of pattern between different approaches in acceptable time.


Sign in / Sign up

Export Citation Format

Share Document