GENERAL ALGORITHMS FOR TESTING THE AMBIGUITY OF FINITE AUTOMATA AND THE DOUBLE-TAPE AMBIGUITY OF FINITE-STATE TRANSDUCERS

2011 ◽  
Vol 22 (04) ◽  
pp. 883-904 ◽  
Author(s):  
CYRIL ALLAUZEN ◽  
MEHRYAR MOHRI ◽  
ASHISH RASTOGI

We present efficient algorithms for testing the finite, polynomial, and exponential ambiguity of finite automata with ε-transitions. We give an algorithm for testing the exponential ambiguity of an automaton A in time [Formula: see text], and finite or polynomial ambiguity in time [Formula: see text], where |A|E denotes the number of transitions of A. These complexities significantly improve over the previous best complexities given for the same problem. Furthermore, the algorithms presented are simple and based on a general algorithm for the composition or intersection of automata. Additionally, we give an algorithm to determine in time [Formula: see text] the degree of polynomial ambiguity of a polynomially ambiguous automaton A and present an application of our algorithms to an approximate computation of the entropy of a probabilistic automaton. We also study the double-tape ambiguity of finite-state transducers. We show that the general problem is undecidable and that it is NP-hard for acyclic transducers. We present a specific analysis of the double-tape ambiguity of transducers with bounded delay. In particular, we give a characterization of double-tape ambiguity for synchronized transducers with zero delay that can be tested in quadratic time and give an algorithm for testing the double-tape ambiguity of transducers with bounded delay.

2003 ◽  
Vol 14 (06) ◽  
pp. 983-994 ◽  
Author(s):  
CYRIL ALLAUZEN ◽  
MEHRYAR MOHRI

Finitely subsequential transducers are efficient finite-state transducers with a finite number of final outputs and are used in a variety of applications. Not all transducers admit equivalent finitely subsequential transducers however. We briefly describe an existing generalized determinization algorithm for finitely subsequential transducers and give the first characterization of finitely subsequentiable transducers, transducers that admit equivalent finitely subsequential transducers. Our characterization shows the existence of an efficient algorithm for testing finite subsequentiability. We have fully implemented the generalized determinization algorithm and the algorithm for testing finite subsequentiability. We report experimental results showing that these algorithms are practical in large-vocabulary speech recognition applications. The theoretical formulation of our results is the equivalence of the following three properties for finite-state transducers: determinizability in the sense of the generalized algorithm, finite subsequentiability, and the twins property.


2021 ◽  
Vol 55 ◽  
pp. 5
Author(s):  
Benedek Nagy ◽  
Zita Kovács

Finite automata and finite state transducers belong to the bases of (theoretical) computer science with many applications. On the other hand, DNA computing and related bio-inspired paradigms are relatively new fields of computing. Watson–Crick automata are in the intersection of the above fields. These finite automata have two reading heads as they read the upper and lower strands of the input DNA molecule, respectively. In 5′ → 3′ Watson–Crick automata the two reading heads move in the same biochemical direction, that is, from the 5′ end of the strand to the direction of the 3′ end. However, in the double-stranded DNA, the DNA strands are directed in opposite way to each other, therefore 5′ → 3′ Watson–Crick automata read the input from the two extremes. In sensing 5′ → 3′ automata the automata sense if the two heads are at the same position, moreover, the computing process is finished at that time. Based on this class of automata, we define WK transducers such that, at each transition, exactly one input letter is being processed, and exactly one output letter is written on a normal output tape. Some special cases are defined and analyzed, e.g., when only one of the reading heads is being used and when the transducer has only one state. We also show that the minimal transducer is uniquely defined if the transducer is deterministic and it has marked output, i.e., the output letter written in a step identifies the reading head that is used in that transition. We have also used the functions ‘processing order’ and ‘reading heads’ to analyze these transducers.


2013 ◽  
Vol 24 (06) ◽  
pp. 847-862 ◽  
Author(s):  
MEHRYAR MOHRI

This paper introduces a new disambiguation algorithm for finite automata and functional finite-state transducers. It gives a full description of this algorithm, including a detailed pseudocode and analysis, and several illustrating examples. The algorithm is often more efficient and the result dramatically smaller than the one obtained using determinization for finite automata or the construction of Schützenberger. The unambiguous automaton or transducer created by our algorithm are never larger than those generated by the construction of Schützenberger. In fact, in a variety of cases, the size of the unambiguous transducer returned by our algorithm is only linear in that of the input transducer while the transducer created by the construction of Schützenberger is exponentially larger. Our algorithm can be used effectively in many applications to make automata and transducers more efficient to use.


2018 ◽  
Vol 29 (05) ◽  
pp. 825-843 ◽  
Author(s):  
Jörg Endrullis ◽  
Juhani Karhumäki ◽  
Jan Willem Klop ◽  
Aleksi Saarela

We study finite-state transducers and their power for transforming infinite words. Infinite sequences of symbols are of paramount importance in a wide range of fields, from formal languages to pure mathematics and physics. While finite automata for recognising and transforming languages are well-understood, very little is known about the power of automata to transform infinite words. The word transformation realised by finite-state transducers gives rise to a complexity comparison of words and thereby induces equivalence classes, called (transducer) degrees, and a partial order on these degrees. The ensuing hierarchy of degrees is analogous to the recursion-theoretic degrees of unsolvability, also known as Turing degrees, where the transformational devices are Turing machines. However, as a complexity measure, Turing machines are too strong: they trivialise the classification problem by identifying all computable words. Finite-state transducers give rise to a much more fine-grained, discriminating hierarchy. In contrast to Turing degrees, hardly anything is known about transducer degrees, in spite of their naturality. We use methods from linear algebra and analysis to show that there are infinitely many atoms in the transducer degrees, that is, minimal non-trivial degrees.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Bilal Elghadyry ◽  
Faissal Ouardi ◽  
Sébastien Verel

AbstractWeighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.


Blood ◽  
2005 ◽  
Vol 105 (3) ◽  
pp. 1329-1336 ◽  
Author(s):  
Jean Soulier ◽  
Thierry Leblanc ◽  
Jérôme Larghero ◽  
Hélène Dastot ◽  
Akiko Shimamura ◽  
...  

AbstractFanconi anemia (FA) is characterized by congenital abnormalities, bone marrow failure, chromosome fragility, and cancer susceptibility. Eight FA-associated genes have been identified so far, the products of which function in the FA/BRCA pathway. A key event in the pathway is the monoubiquitination of the FANCD2 protein, which depends on a multiprotein FA core complex. In a number of patients, spontaneous genetic reversion can correct FA mutations, leading to somatic mosaicism. We analyzed the FA/BRCA pathway in 53 FA patients by FANCD2 immunoblots and chromosome breakage tests. Strikingly, FANCD2 monoubiquitination was detected in peripheral blood lymphocytes (PBLs) in 8 (15%) patients. FA reversion was further shown in these patients by comparison of primary fibro-blasts and PBLs. Reversion was associated with higher blood counts and clinical stability or improvement. Once constitutional FANCD2 patterns were determined, patients could be classified based on the level of FA/BRCA pathway disruption, as “FA core” (upstream inactivation; n = 47, 89%), FA-D2 (n = 4, 8%), and an unidentified downstream group (n = 2, 4%). FA-D2 and unidentified group patients were therefore relatively common, and they had more severe congenital phenotypes. These results show that specific analysis of the FA/BRCA pathway, combined with clinical and chromosome breakage data, allows a comprehensive characterization of FA patients.


2007 ◽  
Vol 18 (04) ◽  
pp. 859-871
Author(s):  
MARTIN ŠIMŮNEK ◽  
BOŘIVOJ MELICHAR

A border of a string is a prefix of the string that is simultaneously its suffix. It is one of the basic stringology keystones used as a part of many algorithms in pattern matching, molecular biology, computer-assisted music analysis and others. The paper offers the automata-theoretical description of Iliopoulos's ALL_BORDERS algorithm. The algorithm finds all borders of a string with don't care symbols. We show that ALL_BORDERS algorithm is an implementation of a finite state transducer of specific form. We describe how such a transducer can be constructed and what should be the input string like. The described transducer finds a set of lengths of all borders. Last but not least, we define approximate borders and show how to find all approximate borders of a string when we concern Hamming distance definition. Our solution of this problem is based on transducers again. This allows us to use analogy with automata-based pattern matching methods. Finally we discuss conditions under which the same principle can be used for other distance measures.


2018 ◽  
Vol 2 (1) ◽  
pp. 75-85
Author(s):  
Rouly Doharma Sihite ◽  
Aditya Wikan Mahastama

Transliteration is still a challenge in helping people to read or write from one to another writing systems. Korean transliteration has been a topic of research to automate the conversion between Hangul (Korean writing system) and Latin characters. Previous works have been done in transliterating Hangul to Latin, using statistical approach (72.2% accuracy) and Extended Markov Models (54.9% accuracy). This research focus on transliterating Latin (romanised) Korean words into Hangul, as many learners of Korean began using Latin first. Selected method is modeling the probable vowel and consonant forms and problable vowel and consonant sequences using Finite State Automata to avoid training. These models are then coded into rules which applied and tested to 100 random Korean words. Initial test results only 40% success rate in transliterating due to the nature that consonants have to be labeled as initial or final of a syllable, and some consonants missed the modeled rules. Additional rules are then added to catch-up and merge these consonants into existing proper syllables, which increased the success rate to 92%. This result is analysed further and it is found that certain consonants sequence caused syllabification problem if exist in a certain position. Other additional rules was inserted and yields 99% final success rate which also is the accuracy of transliterating Korean words written in Latin into Hangul characters in compund syllables.


Sign in / Sign up

Export Citation Format

Share Document