Morphological Analysis and Synthesis of Manipuri Verbs Using Xerox Finite-State Tools

Author(s):  
Ksh. Krishna B. Singha
Author(s):  
Hong-Sen Yan ◽  
Chin-Hsing Kuo

A mechanism that encounters a certain changes in its topological structure during operation is called a mechanism with variable topologies (MVT). This paper is developed for the structural and motion state representations and identifications of MVTs. For representing the topological structures of MVTs, a set of methods including graph and matrix representations is proposed. For representing the motion state characteristics of MVTs, the idea of finite-state machines is employed via the state tables and state graphs. And, two new concepts, the topological homomorphism and motion homomorphism, are proposed for the identifications of structural and motion state characteristics of MVTs. The results of this work provide a logical foundation for the topological analysis and synthesis of mechanisms with variable topologies.


Author(s):  
Lauri Karttunen

The article introduces the basic concepts of finite-state language processing: regular languages and relations, finite-state automata, and regular expressions. Many basic steps in language processing, ranging from tokenization, to phonological and morphological analysis, disambiguation, spelling correction, and shallow parsing, can be performed efficiently by means of finite-state transducers. The article discusses examples of finite-state languages and relations. Finite-state networks can represent only a subset of all possible languages and relations; that is, only some languages are finite-state languages. Furthermore, this article introduces two types of complex regular expressions that have many linguistic applications, restriction and replacement. Finally, the article discusses the properties of finite-state automata. The three important properties of networks are: that they are epsilon free, deterministic, and minimal. If a network encodes a regular language and if it is epsilon free, deterministic, and minimal, the network is guaranteed to be the best encoding for that language.


Author(s):  
Safiriyu Ijiyemi Eludiora ◽  
O R Ayemonisan

Nigeria official languages are English, Yorùbá, Igbo and Hausa. The focus of the study reported in this paper is to develop learning tool that can assist learners to learn the Yorùbá language using its alphabets. The study is critical to Yorùbá language, because of its endangerment. There is need to introduce different learning tools that can mitigate its extinction. A Yorùbá word perfect system was developed to assist people in learning the Yorùbá language. English and Yorùbá words formation are experimented using computational morphological approach (word formation). The theoretical framework considered Finite state automata (FSA) to realise different ways of combining the consonants and vowels to form word. Two to five letter words were considered. The system was designed and implemented using UML tools and python programming language.The system will teach the users on how the words are formed, and the number of syllables in each word. The user  need not to know how to tone mark word before he/she can use the system. Any word typed will be analysed according to its number of syllables. This approach produces representatives of all parts of speech (POS) of the two languages. It produces corpora for the two languages


2019 ◽  
Author(s):  
Francis M. Tyers ◽  
Jonathan N. Washington ◽  
Darya Kavitskaya ◽  
Memduh Gökırmak

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise. Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and, to our knowledge, is the first biscriptual single morphological transducer for any language.


2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


2021 ◽  
Vol 17 (1) ◽  
pp. 558-564
Author(s):  
Bakhtiyor Mengliyev ◽  
Shohida Shahabitdinova ◽  
Shahlo Khamroeva ◽  
Shakhnoza Gulyamova ◽  
Adiba Botirova

2003 ◽  
Vol 9 (1) ◽  
pp. 87-99 ◽  
Author(s):  
KEMAL OFLAZER

This paper presents a scheme that allows one to relax the all-or-none nature of two-level constraints in two-level morphology in a controlled manner, so that word forms with violations of some of the two-level constraints can be analyzed and ranked. The problem has been motivated by a recent phenomenon in Turkish with imported words that violate a fundamental assumption of Turkish that pronunciation and orthography have almost a one-to-one correspondence, and by a problem in Basque words with differing amounts of competence errors. We present the formulation of our proposal, and provide details of implementations for both problems using the XRCE Finite State Toolkit.


2007 ◽  
Vol 26 (2) ◽  
Author(s):  
Amir Zeldes

AbstractThis paper presents a morphophonology-based Item-and-Process approach to the finite-state lemmatization and morphological analysis of Polish. Unlike current text-based techniques, which search for all possible orthographic representations of Polish morphological suffixes, the multilevel phonological feature based algorithm presented here extracts morphophoneme arrays from graphemic word forms, allowing the extraction of abstract suffixes, independent of their surface representation. This makes it possible to use a simple mono-lemmatic dictionary, as well as to distinguish between homographic suffixes, and to carry out various phonological and morphological investigations using suffix fields in corpora.


Sign in / Sign up

Export Citation Format

Share Document