finite state transducer Latest Research Papers

For a formal language [Formula: see text], the problem of language enumeration asks to compute the length-lexicographically smallest word in [Formula: see text] larger than a given input [Formula: see text] (henceforth called the [Formula: see text]-successor of [Formula: see text]). We investigate this problem for regular languages from a computational complexity and state complexity perspective. We first show that if [Formula: see text] is recognized by a DFA with [Formula: see text] states, then [Formula: see text] states are (in general) necessary and sufficient for an unambiguous finite-state transducer to compute [Formula: see text]-successors. As a byproduct, we obtain that if [Formula: see text] is recognized by a DFA with [Formula: see text] states, then [Formula: see text] states are sufficient for a DFA to recognize the subset [Formula: see text] of [Formula: see text] composed of its lexicographically smallest words. We give a matching lower bound that holds even if [Formula: see text] is represented as an NFA. It has been known that [Formula: see text]-successors can be computed in polynomial time, even if the regular language is given as part of the input (assuming a suitable representation of the language, such as a DFA). In this paper, we refine this result in multiple directions. We show that if the regular language is given as part of the input and encoded as a DFA, the problem is in [Formula: see text]. If the regular language [Formula: see text] is fixed, we prove that the enumeration problem of the language is reducible to deciding membership to the Myhill-Nerode equivalence classes of [Formula: see text] under [Formula: see text]-uniform [Formula: see text] reductions. In particular, this implies that fixed star-free languages can be enumerated in [Formula: see text], arbitrary fixed regular languages can be enumerated in [Formula: see text] and that there exist regular languages for which the problem is [Formula: see text]-complete.

Download Full-text

Amazigh-Sys: Intelligent system for recognition of amazigh words

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp482-489 ◽

2021 ◽

Vol 10 (2) ◽

pp. 482

Author(s):

Rachid Ammari ◽

Lahbib Zenkouar

Keyword(s):

Morphological Analysis ◽

System Analysis ◽

Intelligent System ◽

Recognition Rate ◽

Automatic Processing ◽

Finite State ◽

Finite State Transducer ◽

Amazigh Language ◽

Analysis System ◽

The Relationship

<span id="docs-internal-guid-0264fec3-7fff-a3e5-94ef-c25bcfddc65d"><span>Amazigh-sys is an intelligent morphological analysis system for Amazigh language based on xerox’s finite-state transducer (XFST). Our system can process simultaneously five lexical units. This paper begins with the development of Amazigh lexicon (AMAlex) for attested nouns, verbs, pronouns, prepositions, and adverbs and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The use of finite-state technology ensures the bidirectionality of our system (analysis and generation). Amazigh-sys is the first general morphological analysis system for Amazigh based on xerox finite state able to process and recognize all lexical units and ensures a high recognition rate of input words. This contribution facilitates the implementation of other applications related to the automatic processing of the Amazigh language.</span></span>

Download Full-text

ThamizhiMorph: A morphological parser for the Tamil language

Machine Translation ◽

10.1007/s10590-021-09261-5 ◽

2021 ◽

Author(s):

Kengatharaiyer Sarveswaran ◽

Gihan Dias ◽

Miriam Butt

Keyword(s):

High Performance ◽

Data Sets ◽

Inflectional Morphology ◽

Storage And Retrieval ◽

Finite State ◽

Finite State Transducer ◽

Morphologically Rich Languages ◽

High Level ◽

Further Development ◽

High Performance Level

AbstractThis paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.

Download Full-text

APMorph: finite-state transducer for Amazigh pronominal morphology

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp699-706 ◽

2021 ◽

Vol 11 (1) ◽

pp. 699

Author(s):

Rachid Ammari ◽

Ahbib Zenkoua

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Automatic Processing ◽

Finite State ◽

Finite State Transducer ◽

The Relationship

Our work aims to present an amazigh pronominal morphological analyzer (APMorph) based on xerox’s finite-state transducer (XFST). Our system revolves around a large lexicon named “APlex” including the affixed pronoun to the noun and to the verb and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The implementation and the evaluation of our approach will be detailed within this article. The use of XFST remains a relevant choice in the sense that this platform allows both analysis and generation. The robustness of our system makes it able to be integrated in other applications of natural language processing (NLP) especially spellchecking, machine translation, and machine learning. This paper presents a continuation of our previous works on the automatic processing of Amazigh nouns and verbs.

Download Full-text

Towards and Open Source Finite-State Morphological Analyzer for Zacatlán-Ahuacatlán-Tepetzintla Nuhuatl

10.33011/computel.v1i.963 ◽

2021 ◽

Vol 1 (2) ◽

Author(s):

Robert Pugh ◽

Francis Yyers ◽

Marivel Huerta Mendez

Keyword(s):

Open Source ◽

Current Status ◽

Low Resource ◽

Finite State ◽

Finite State Transducer ◽

And Performance ◽

Future Work

In this paper, we describe an in-progress, free and open-source Finite-State Transducer morphological analyzer for an understudied Nahuatl variant. We discuss our general approach, some of the technical implementation details, the challenges that accompany building such a system for a low-resource language variant, the current status and performance of the system, and directions for future work.

Download Full-text

Representation of Yine [Arawak] Morphology by Finite State Transducer Formalism

10.18653/v1/2021.americasnlp-1.11 ◽

2021 ◽

Author(s):

Adriano Ingunza Torres ◽

John Miller ◽

Arturo Oncevay ◽

Roberto Zariquiey Biondi

Keyword(s):

Finite State ◽

Finite State Transducer

Download Full-text

Boosting Reversible Pushdown and Queue Machines by Preprocessing

International Journal of Foundations of Computer Science ◽

10.1142/s0129054120420022 ◽

2020 ◽

pp. 1-29

Author(s):

Holger Bock Axelsen ◽

Martin Kutrib ◽

Andreas Malcher ◽

Matthias Wendlandt

Keyword(s):

Real Time ◽

Finite Automata ◽

Point Of View ◽

Regular Languages ◽

Theoretical Point ◽

Computational Power ◽

Finite State ◽

Finite State Transducer ◽

Context Free ◽

Pushdown Automata

It is well known that reversible finite automata do not accept all regular languages, that reversible pushdown automata do not accept all deterministic context-free languages, and that reversible queue automata are less powerful than deterministic real-time queue automata. It is of significant interest from both a practical and theoretical point of view to close these gaps. We here extend these reversible models by a preprocessing unit which is basically a reversible injective and length-preserving finite state transducer. It turns out that preprocessing the input using such weak devices increases the computational power of reversible deterministic finite automata to the acceptance of all regular languages, whereas for reversible pushdown automata the accepted family of languages lies strictly in between the reversible deterministic context-free languages and the real-time deterministic context-free languages. For reversible queue automata the preprocessing of the input leads to machines that are stronger than real-time reversible queue automata, but less powerful than real-time deterministic (irreversible) queue automata. Moreover, it is shown that the computational power of all three types of machines is not changed by allowing the preprocessing finite state transducer to work irreversibly. Finally, we examine the closure properties of the family of languages accepted by such machines.

Download Full-text

RERTL: Finite State Transducer Logic Recovery at Register Transfer Level

2019 Asian Hardware Oriented Security and Trust Symposium (AsianHOST) ◽

10.1109/asianhost47458.2019.9006699 ◽

2019 ◽

Author(s):

Jason Portillo ◽

Travis Meade ◽

John Hacker ◽

Shaojie Zhang ◽

Yier Jin

Keyword(s):

Register Transfer Level ◽

Register Transfer ◽

Finite State ◽

Finite State Transducer

Download Full-text

Bootstrapping a Neural Morphological Generator from Morphological Analyzer Output for Inuktitut

10.33011/computel.v2i.455 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Jeffrey Micher

Keyword(s):

Neural Network ◽

Training Data ◽

Data Set ◽

Set Size ◽

The Neural Network ◽

Surface Character ◽

Finite State ◽

Character Sequences ◽

Finite State Transducer

We present a method for building a morphological generator from the output of an existing analyzer for Inuktitut, in the absence of a two-way finite state transducer which would normally provide this functionality. We make use of a sequence to sequence neural network which “translates” underlying Inuktitut morpheme sequences into surface character sequences. The neural network uses only the previous and the following morphemes as context. We report a morpheme accuracy of approximately 86%. We are able to increase this accuracy slightly by passing deep morphemes directly to output for unknown morphemes. We do not see significant improvement when increasing training data set size, and postulate possible causes for this.

Download Full-text

finite state transducer
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A study of biasing technical terms in medical speech recognition using weighted finite-state transducer

Recognizing Lexicographically Smallest Words and Computing Successors in Regular Languages

Amazigh-Sys: Intelligent system for recognition of amazigh words

ThamizhiMorph: A morphological parser for the Tamil language

APMorph: finite-state transducer for Amazigh pronominal morphology

Towards and Open Source Finite-State Morphological Analyzer for Zacatlán-Ahuacatlán-Tepetzintla Nuhuatl

Representation of Yine [Arawak] Morphology by Finite State Transducer Formalism

Boosting Reversible Pushdown and Queue Machines by Preprocessing

RERTL: Finite State Transducer Logic Recovery at Register Transfer Level

Bootstrapping a Neural Morphological Generator from Morphological Analyzer Output for Inuktitut

Export Citation Format

finite state transducerRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A study of biasing technical terms in medical speech recognition using weighted finite-state transducer

Recognizing Lexicographically Smallest Words and Computing Successors in Regular Languages

Amazigh-Sys: Intelligent system for recognition of amazigh words

ThamizhiMorph: A morphological parser for the Tamil language

APMorph: finite-state transducer for Amazigh pronominal morphology

Towards and Open Source Finite-State Morphological Analyzer for Zacatlán-Ahuacatlán-Tepetzintla Nuhuatl

Representation of Yine [Arawak] Morphology by Finite State Transducer Formalism

Boosting Reversible Pushdown and Queue Machines by Preprocessing

RERTL: Finite State Transducer Logic Recovery at Register Transfer Level

Bootstrapping a Neural Morphological Generator from Morphological Analyzer Output for Inuktitut

finite state transducer
Recently Published Documents