The Japanese lexical transducer based on stem-suffix style forms

A Lexical Transducer (LT) as defined by Karttunen, Kaplan, Zaenen 1992 is a specialized finite state transducer (FST) that relates citation forms of words and their morphological categories to inflected surface forms. Using LTs is advantageous because the same structure and algorithms can be used for morphological analysis (stemming) and generation. Morphological processing (analysis and generation) is computationally faster, and the data for the process can be compacted more tightly than with other methods. The standard way to construct an LT consists of three steps: (1) constructing a simple finite state source lexicon LA which defines all valid canonical citation forms of the language; (2) describing morphological alternations by means of two-level rules, compiling the rules to FSTs, and intersecting them to form a single rule transducer RT; and (3) composing LA and RT.

Download Full-text

Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer

10.33011/computel.v1i.4277 ◽

2019 ◽

Author(s):

Lane Schwartz ◽

Emily Chen ◽

Benjamin Hunt ◽

Sylvia LR Schreiner

Keyword(s):

Morphological Analysis ◽

Training Data ◽

Language Family ◽

Enabling Technology ◽

Finite State ◽

Finite State Transducer ◽

Improve Analysis ◽

Testing Set

Morphological analysis is a critical enabling technology for polysynthetic languages. We present a neural morphological analyzer for case-inflected nouns in St. Lawrence Island Yupik, an endangered polysythetic language in the Inuit-Yupik language family, treating morphological analysis as a recurrent neural sequence-to-sequence task. By utilizing an existing finite-state morphological analyzer to create training data, we improve analysis coverage on attested Yupik word types from approximately 75% for the existing finite-state analyzer to 100% for the neural analyzer. At the same time, we achieve a substantially higher level of accuracy on a held-out testing set, from 78.9% accuracy for the finite-state analyzer to 92.2% accuracy for our neural analyzer.

Download Full-text

Amazigh-Sys: Intelligent system for recognition of amazigh words

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp482-489 ◽

2021 ◽

Vol 10 (2) ◽

pp. 482

Author(s):

Rachid Ammari ◽

Lahbib Zenkouar

Keyword(s):

Morphological Analysis ◽

System Analysis ◽

Intelligent System ◽

Recognition Rate ◽

Automatic Processing ◽

Finite State ◽

Finite State Transducer ◽

Amazigh Language ◽

Analysis System ◽

The Relationship

<span id="docs-internal-guid-0264fec3-7fff-a3e5-94ef-c25bcfddc65d"><span>Amazigh-sys is an intelligent morphological analysis system for Amazigh language based on xerox’s finite-state transducer (XFST). Our system can process simultaneously five lexical units. This paper begins with the development of Amazigh lexicon (AMAlex) for attested nouns, verbs, pronouns, prepositions, and adverbs and the characteristics relating to each lemma. A set of rules are added to define the inflectional behavior and morphosyntactic links of each entry as well as the relationship between the different lexical units. The use of finite-state technology ensures the bidirectionality of our system (analysis and generation). Amazigh-sys is the first general morphological analysis system for Amazigh based on xerox finite state able to process and recognize all lexical units and ensures a high recognition rate of input words. This contribution facilitates the implementation of other applications related to the automatic processing of the Amazigh language.</span></span>

Download Full-text

BORDERS AND FINITE AUTOMATA

International Journal of Foundations of Computer Science ◽

10.1142/s0129054107005029 ◽

2007 ◽

Vol 18 (04) ◽

pp. 859-871

Author(s):

MARTIN ŠIMŮNEK ◽

BOŘIVOJ MELICHAR

Keyword(s):

Pattern Matching ◽

Hamming Distance ◽

Finite Automata ◽

Music Analysis ◽

Theoretical Description ◽

Specific Form ◽

Distance Measures ◽

Computer Assisted ◽

Finite State ◽

Finite State Transducer

A border of a string is a prefix of the string that is simultaneously its suffix. It is one of the basic stringology keystones used as a part of many algorithms in pattern matching, molecular biology, computer-assisted music analysis and others. The paper offers the automata-theoretical description of Iliopoulos's ALL_BORDERS algorithm. The algorithm finds all borders of a string with don't care symbols. We show that ALL_BORDERS algorithm is an implementation of a finite state transducer of specific form. We describe how such a transducer can be constructed and what should be the input string like. The described transducer finds a set of lengths of all borders. Last but not least, we define approximate borders and show how to find all approximate borders of a string when we concern Hamming distance definition. Our solution of this problem is based on transducers again. This allows us to use analogy with automata-based pattern matching methods. Finally we discuss conditions under which the same principle can be used for other distance measures.

Download Full-text

Hidden Semi-Markov Model Based Speech Recognition System using Weighted Finite-State Transducer

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1659950 ◽

2006 ◽

Cited By ~ 5

Author(s):

K. Oura ◽

Heiga Zen ◽

Y. Nankaku ◽

Akinobu Lee ◽

K. Tokuda

Keyword(s):

Speech Recognition ◽

Markov Model ◽

Recognition System ◽

Speech Recognition System ◽

Model Based ◽

Finite State ◽

Finite State Transducer

Download Full-text

Bootstrapping a Neural Morphological Generator from Morphological Analyzer Output for Inuktitut

10.33011/computel.v2i.455 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Jeffrey Micher

Keyword(s):

Neural Network ◽

Training Data ◽

Data Set ◽

Set Size ◽

The Neural Network ◽

Surface Character ◽

Finite State ◽

Character Sequences ◽

Finite State Transducer

We present a method for building a morphological generator from the output of an existing analyzer for Inuktitut, in the absence of a two-way finite state transducer which would normally provide this functionality. We make use of a sequence to sequence neural network which “translates” underlying Inuktitut morpheme sequences into surface character sequences. The neural network uses only the previous and the following morphemes as context. We report a morpheme accuracy of approximately 86%. We are able to increase this accuracy slightly by passing deep morphemes directly to output for unknown morphemes. We do not see significant improvement when increasing training data set size, and postulate possible causes for this.

Download Full-text