BORDERS AND FINITE AUTOMATA

2007 ◽  
Vol 18 (04) ◽  
pp. 859-871
Author(s):  
MARTIN ŠIMŮNEK ◽  
BOŘIVOJ MELICHAR

A border of a string is a prefix of the string that is simultaneously its suffix. It is one of the basic stringology keystones used as a part of many algorithms in pattern matching, molecular biology, computer-assisted music analysis and others. The paper offers the automata-theoretical description of Iliopoulos's ALL_BORDERS algorithm. The algorithm finds all borders of a string with don't care symbols. We show that ALL_BORDERS algorithm is an implementation of a finite state transducer of specific form. We describe how such a transducer can be constructed and what should be the input string like. The described transducer finds a set of lengths of all borders. Last but not least, we define approximate borders and show how to find all approximate borders of a string when we concern Hamming distance definition. Our solution of this problem is based on transducers again. This allows us to use analogy with automata-based pattern matching methods. Finally we discuss conditions under which the same principle can be used for other distance measures.

Author(s):  
Holger Bock Axelsen ◽  
Martin Kutrib ◽  
Andreas Malcher ◽  
Matthias Wendlandt

It is well known that reversible finite automata do not accept all regular languages, that reversible pushdown automata do not accept all deterministic context-free languages, and that reversible queue automata are less powerful than deterministic real-time queue automata. It is of significant interest from both a practical and theoretical point of view to close these gaps. We here extend these reversible models by a preprocessing unit which is basically a reversible injective and length-preserving finite state transducer. It turns out that preprocessing the input using such weak devices increases the computational power of reversible deterministic finite automata to the acceptance of all regular languages, whereas for reversible pushdown automata the accepted family of languages lies strictly in between the reversible deterministic context-free languages and the real-time deterministic context-free languages. For reversible queue automata the preprocessing of the input leads to machines that are stronger than real-time reversible queue automata, but less powerful than real-time deterministic (irreversible) queue automata. Moreover, it is shown that the computational power of all three types of machines is not changed by allowing the preprocessing finite state transducer to work irreversibly. Finally, we examine the closure properties of the family of languages accepted by such machines.


2017 ◽  
Vol 5 (1) ◽  
pp. 8-15
Author(s):  
Sergii Hilgurt ◽  

The multi-pattern matching is a fundamental technique found in applications like a network intrusion detection system, anti-virus, anti-worms and other signature- based information security tools. Due to rising traffic rates, increasing number and sophistication of attacks and the collapse of Moore’s law, traditional software solutions can no longer keep up. Therefore, hardware approaches are frequently being used by developers to accelerate pattern matching. Reconfigurable FPGA-based devices, providing the flexibility of software and the near-ASIC performance, have become increasingly popular for this purpose. Hence, increasing the efficiency of reconfigurable information security tools is a scientific issue now. Many different approaches to constructing hardware matching circuits on FPGAs are known. The most widely used of them are based on discrete comparators, hash-functions and finite automata. Each approach possesses its own pros and cons. None of them still became the leading one. In this paper, a method to combine several different approaches to enforce their advantages has been developed. An analytical technique to quickly advance estimate the resource costs of each matching scheme without need to compile FPGA project has been proposed. It allows to apply optimization procedures to near-optimally split the set of pattern between different approaches in acceptable time.


2014 ◽  
Vol 53 ◽  
Author(s):  
Loek Cleophas ◽  
Derrick G. Kourie ◽  
Bruce W. Watson

In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automata (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Jeffrey Micher

We present a method for building a morphological generator from the output of an existing analyzer for Inuktitut, in the absence of a two-way finite state transducer which would normally provide this functionality. We make use of a sequence to sequence neural network which “translates” underlying Inuktitut morpheme sequences into surface character sequences. The neural network uses only the previous and the following morphemes as context. We report a morpheme accuracy of approximately 86%. We are able to increase this accuracy slightly by passing deep morphemes directly to output for unknown morphemes. We do not see significant improvement when increasing training data set size, and postulate possible causes for this.


2018 ◽  
Vol 2 (1) ◽  
pp. 75-85
Author(s):  
Rouly Doharma Sihite ◽  
Aditya Wikan Mahastama

Transliteration is still a challenge in helping people to read or write from one to another writing systems. Korean transliteration has been a topic of research to automate the conversion between Hangul (Korean writing system) and Latin characters. Previous works have been done in transliterating Hangul to Latin, using statistical approach (72.2% accuracy) and Extended Markov Models (54.9% accuracy). This research focus on transliterating Latin (romanised) Korean words into Hangul, as many learners of Korean began using Latin first. Selected method is modeling the probable vowel and consonant forms and problable vowel and consonant sequences using Finite State Automata to avoid training. These models are then coded into rules which applied and tested to 100 random Korean words. Initial test results only 40% success rate in transliterating due to the nature that consonants have to be labeled as initial or final of a syllable, and some consonants missed the modeled rules. Additional rules are then added to catch-up and merge these consonants into existing proper syllables, which increased the success rate to 92%. This result is analysed further and it is found that certain consonants sequence caused syllabification problem if exist in a certain position. Other additional rules was inserted and yields 99% final success rate which also is the accuracy of transliterating Korean words written in Latin into Hangul characters in compund syllables.


Author(s):  
Cyril Allauzen ◽  
Michael Riley ◽  
Johan Schalkwyk ◽  
Wojciech Skut ◽  
Mehryar Mohri

2010 ◽  
Author(s):  
Lluís-F. Hurtado ◽  
Joaquin Planells ◽  
Encarna Segarra ◽  
Emilio Sanchis ◽  
David Griol

Sign in / Sign up

Export Citation Format

Share Document