finite state transducers Latest Research Papers

Abstract Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.

Download Full-text

Composition of weighted finite transducers in MapReduce

Journal Of Big Data ◽

10.1186/s40537-020-00397-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Bilal Elghadyry ◽

Faissal Ouardi ◽

Sébastien Verel

Keyword(s):

Speech Processing ◽

Large Scale ◽

Large Scale Data ◽

Finite State Transducers ◽

Wide Range ◽

Finite State ◽

Common Operation ◽

Efficient Representation ◽

Weighted Finite State Transducers ◽

Np Hardness

AbstractWeighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text

Quantum Logical Depth and Shallowness of Streaming Data by One-Way Quantum Finite-State Transducers (Preliminary Report)

10.1007/978-3-030-87993-8_12 ◽

2021 ◽

pp. 177-193

Author(s):

Tomoyuki Yamakami

Keyword(s):

Preliminary Report ◽

Streaming Data ◽

Finite State Transducers ◽

Finite State

Download Full-text

On deterministic 1-limited 5′ → 3′ sensing Watson–Crick finite-state transducers

RAIRO - Theoretical Informatics and Applications ◽

10.1051/ita/2021007 ◽

2021 ◽

Vol 55 ◽

pp. 5

Author(s):

Benedek Nagy ◽

Zita Kovács

Keyword(s):

Dna Computing ◽

Finite Automata ◽

Theoretical Computer Science ◽

Theoretical Computer ◽

Double Stranded Dna ◽

Finite State Transducers ◽

Special Cases ◽

Finite State ◽

Dna Strands ◽

Processing Order

Finite automata and finite state transducers belong to the bases of (theoretical) computer science with many applications. On the other hand, DNA computing and related bio-inspired paradigms are relatively new fields of computing. Watson–Crick automata are in the intersection of the above fields. These finite automata have two reading heads as they read the upper and lower strands of the input DNA molecule, respectively. In 5′ → 3′ Watson–Crick automata the two reading heads move in the same biochemical direction, that is, from the 5′ end of the strand to the direction of the 3′ end. However, in the double-stranded DNA, the DNA strands are directed in opposite way to each other, therefore 5′ → 3′ Watson–Crick automata read the input from the two extremes. In sensing 5′ → 3′ automata the automata sense if the two heads are at the same position, moreover, the computing process is finished at that time. Based on this class of automata, we define WK transducers such that, at each transition, exactly one input letter is being processed, and exactly one output letter is written on a normal output tape. Some special cases are defined and analyzed, e.g., when only one of the reading heads is being used and when the transducer has only one state. We also show that the minimal transducer is uniquely defined if the transducer is deterministic and it has marked output, i.e., the output letter written in a step identifies the reading head that is used in that transition. We have also used the functions ‘processing order’ and ‘reading heads’ to analyze these transducers.

Download Full-text

Descriptional Complexity of Iterated Uniform Finite-State Transducers

Information and Computation ◽

10.1016/j.ic.2021.104691 ◽

2021 ◽

pp. 104691

Author(s):

Martin Kutrib ◽

Andreas Malcher ◽

Carlo Mereghetti ◽

Beatrice Palano

Keyword(s):

Descriptional Complexity ◽

Finite State Transducers ◽

Finite State

Download Full-text

Iterated Uniform Finite-State Transducers on Unary Languages

SOFSEM 2021: Theory and Practice of Computer Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-67731-2_16 ◽

2021 ◽

pp. 218-232

Author(s):

Martin Kutrib ◽

Andreas Malcher ◽

Carlo Mereghetti ◽

Beatrice Palano

Keyword(s):

Finite State Transducers ◽

Finite State ◽

Unary Languages

Download Full-text

Complexity and Categoricity of Injection Structures Induced by Finite State Transducers

Lecture Notes in Computer Science - Connecting with Computability ◽

10.1007/978-3-030-80049-9_10 ◽

2021 ◽

pp. 106-119

Author(s):

Richard Krogman ◽

Douglas Cenzer

Keyword(s):

Finite State Transducers ◽

Finite State

Download Full-text

Digging input-driven pushdown automata

RAIRO - Theoretical Informatics and Applications ◽

10.1051/ita/2021006 ◽

2021 ◽

Vol 55 ◽

pp. 6

Author(s):

Martin Kutrib ◽

Andreas Malcher

Keyword(s):

Input Symbol ◽

Closure Properties ◽

Descriptional Complexity ◽

Finite State Transducers ◽

Finite State ◽

The Impact ◽

Pushdown Automata

Input-driven pushdown automata (IDPDA) are pushdown automata where the next action on the pushdown store (push, pop, nothing) is solely governed by the input symbol. Nowadays such devices are usually defined such that popping from the empty pushdown does not block the computation but continues it with empty pushdown. Here, we consider IDPDAs that have a more balanced behavior concerning pushing and popping. Digging input-driven pushdown automata (DIDPDA) are basically IDPDAs that, when forced to pop from the empty pushdown, dig a hole of the shape of the popped symbol in the bottom of the pushdown. Popping further symbols from a pushdown having a hole at the bottom deepens the current hole furthermore. The hole can only be filled up by pushing symbols previously popped. We study the impact of the new behavior of DIDPDAs on their power and compare their capacities with the capacities of ordinary IDPDAs and tinput-driven pushdown automata which are basically IDPDAs whose input may be preprocessed by length-preserving finite state transducers. It turns out that the capabilities are incomparable. We address the determinization of DIDPDAs and their descriptional complexity, closure properties, and decidability questions.

Download Full-text

Composition of Weighted Finite Transducers in MapReduce

10.21203/rs.3.rs-101167/v1 ◽

2020 ◽

Author(s):

Bilal Elghadyry ◽

Faissal Ouardi ◽

Sébastien Verel

Keyword(s):

Speech Processing ◽

Large Scale ◽

Large Scale Data ◽

Finite State Transducers ◽

Wide Range ◽

Finite State ◽

Common Operation ◽

Efficient Representation ◽

Weighted Finite State Transducers ◽

Np Hardness

Abstract Weighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text