finite state transducer
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 12)

H-INDEX

12
(FIVE YEARS 0)

Author(s):  
Lukas Fleischer ◽  
Jeffrey Shallit

For a formal language [Formula: see text], the problem of language enumeration asks to compute the length-lexicographically smallest word in [Formula: see text] larger than a given input [Formula: see text] (henceforth called the [Formula: see text]-successor of [Formula: see text]). We investigate this problem for regular languages from a computational complexity and state complexity perspective. We first show that if [Formula: see text] is recognized by a DFA with [Formula: see text] states, then [Formula: see text] states are (in general) necessary and sufficient for an unambiguous finite-state transducer to compute [Formula: see text]-successors. As a byproduct, we obtain that if [Formula: see text] is recognized by a DFA with [Formula: see text] states, then [Formula: see text] states are sufficient for a DFA to recognize the subset [Formula: see text] of [Formula: see text] composed of its lexicographically smallest words. We give a matching lower bound that holds even if [Formula: see text] is represented as an NFA. It has been known that [Formula: see text]-successors can be computed in polynomial time, even if the regular language is given as part of the input (assuming a suitable representation of the language, such as a DFA). In this paper, we refine this result in multiple directions. We show that if the regular language is given as part of the input and encoded as a DFA, the problem is in [Formula: see text]. If the regular language [Formula: see text] is fixed, we prove that the enumeration problem of the language is reducible to deciding membership to the Myhill-Nerode equivalence classes of [Formula: see text] under [Formula: see text]-uniform [Formula: see text] reductions. In particular, this implies that fixed star-free languages can be enumerated in [Formula: see text], arbitrary fixed regular languages can be enumerated in [Formula: see text] and that there exist regular languages for which the problem is [Formula: see text]-complete.


Author(s):  
Rachid Ammari ◽  
Lahbib Zenkouar

<span id="docs-internal-guid-0264fec3-7fff-a3e5-94ef-c25bcfddc65d"><span>Amazigh-sys is an intelligent morphological analysis system for Amazigh language based on xerox’s finite-state transducer (XFST). Our system can process simultaneously five lexical units. This paper begins with the development of Amazigh lexicon (AMAlex) for attested nouns, verbs, pronouns, prepositions, and adverbs and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The use of finite-state technology ensures the bidirectionality of our system (analysis and generation). Amazigh-sys is the first general morphological analysis system for Amazigh based on xerox finite state able to process and recognize all lexical units and ensures a high recognition rate of input words. This contribution facilitates the implementation of other applications related to the automatic processing of the Amazigh language.</span></span>


Author(s):  
Kengatharaiyer Sarveswaran ◽  
Gihan Dias ◽  
Miriam Butt

AbstractThis paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.


Author(s):  
Rachid Ammari ◽  
Ahbib Zenkoua

Our work aims to present an amazigh pronominal morphological analyzer (APMorph) based on xerox’s finite-state transducer (XFST). Our system revolves around a large lexicon named “APlex” including the affixed pronoun to the noun and to the verb and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The implementation and the evaluation of our approach will be detailed within this article. The use of XFST remains a relevant choice in the sense that this platform allows both analysis and generation. The robustness of our system makes it able to be integrated in other applications of natural language processing (NLP) especially spellchecking, machine translation, and machine learning. This paper presents a continuation of our previous works on the automatic processing of Amazigh nouns and verbs.


2021 ◽  
Vol 1 (2) ◽  
Author(s):  
Robert Pugh ◽  
Francis Yyers ◽  
Marivel Huerta Mendez

In this paper, we describe an in-progress, free and open-source Finite-State Transducer morphological analyzer for an understudied Nahuatl variant. We discuss our general approach, some of the technical implementation details, the challenges that accompany building such a system for a low-resource language variant, the current status and performance of the system, and directions for future work.


Author(s):  
Adriano Ingunza Torres ◽  
John Miller ◽  
Arturo Oncevay ◽  
Roberto Zariquiey Biondi

Author(s):  
Holger Bock Axelsen ◽  
Martin Kutrib ◽  
Andreas Malcher ◽  
Matthias Wendlandt

It is well known that reversible finite automata do not accept all regular languages, that reversible pushdown automata do not accept all deterministic context-free languages, and that reversible queue automata are less powerful than deterministic real-time queue automata. It is of significant interest from both a practical and theoretical point of view to close these gaps. We here extend these reversible models by a preprocessing unit which is basically a reversible injective and length-preserving finite state transducer. It turns out that preprocessing the input using such weak devices increases the computational power of reversible deterministic finite automata to the acceptance of all regular languages, whereas for reversible pushdown automata the accepted family of languages lies strictly in between the reversible deterministic context-free languages and the real-time deterministic context-free languages. For reversible queue automata the preprocessing of the input leads to machines that are stronger than real-time reversible queue automata, but less powerful than real-time deterministic (irreversible) queue automata. Moreover, it is shown that the computational power of all three types of machines is not changed by allowing the preprocessing finite state transducer to work irreversibly. Finally, we examine the closure properties of the family of languages accepted by such machines.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Jeffrey Micher

We present a method for building a morphological generator from the output of an existing analyzer for Inuktitut, in the absence of a two-way finite state transducer which would normally provide this functionality. We make use of a sequence to sequence neural network which “translates” underlying Inuktitut morpheme sequences into surface character sequences. The neural network uses only the previous and the following morphemes as context. We report a morpheme accuracy of approximately 86%. We are able to increase this accuracy slightly by passing deep morphemes directly to output for unknown morphemes. We do not see significant improvement when increasing training data set size, and postulate possible causes for this.


Sign in / Sign up

Export Citation Format

Share Document