Lenient morphological analysis

2003 ◽  
Vol 9 (1) ◽  
pp. 87-99 ◽  
Author(s):  
KEMAL OFLAZER

This paper presents a scheme that allows one to relax the all-or-none nature of two-level constraints in two-level morphology in a controlled manner, so that word forms with violations of some of the two-level constraints can be analyzed and ranked. The problem has been motivated by a recent phenomenon in Turkish with imported words that violate a fundamental assumption of Turkish that pronunciation and orthography have almost a one-to-one correspondence, and by a problem in Basque words with differing amounts of competence errors. We present the formulation of our proposal, and provide details of implementations for both problems using the XRCE Finite State Toolkit.

2007 ◽  
Vol 26 (2) ◽  
Author(s):  
Amir Zeldes

AbstractThis paper presents a morphophonology-based Item-and-Process approach to the finite-state lemmatization and morphological analysis of Polish. Unlike current text-based techniques, which search for all possible orthographic representations of Polish morphological suffixes, the multilevel phonological feature based algorithm presented here extracts morphophoneme arrays from graphemic word forms, allowing the extraction of abstract suffixes, independent of their surface representation. This makes it possible to use a simple mono-lemmatic dictionary, as well as to distinguish between homographic suffixes, and to carry out various phonological and morphological investigations using suffix fields in corpora.


Author(s):  
Martin Maiden

The historical morphology of the verb ‘snow’ in Francoprovençal presents a conundrum, in that it is clearly analogically influenced by the verb ‘rain’, for obvious reasons of lexical semantic similarity, but the locus of that influence is not the ‘root’ (the ostensible bearer of lexical meaning) but desinential inflexion-class members, which are in principle independent of any lexical meaning. Similar morphological changes are also identified for other Gallo-Romance verbs. It seems, in effect, that speakers can identify exponents of the lexical meaning of word-forms in linear sequences larger than the apparent ‘morphemic’ composition of those word-forms, even when such a composition may seem prima facie transparent and obvious. It is argued that these facts are inherently incompatible with ‘constructivist’, morpheme-based, models of morphology, and strongly compatible with what have been called ‘abstractivist’ (‘word-and-paradigm’) approaches, which generally take entire word-forms as the primary units of morphological analysis.


Author(s):  
Lauri Karttunen

The article introduces the basic concepts of finite-state language processing: regular languages and relations, finite-state automata, and regular expressions. Many basic steps in language processing, ranging from tokenization, to phonological and morphological analysis, disambiguation, spelling correction, and shallow parsing, can be performed efficiently by means of finite-state transducers. The article discusses examples of finite-state languages and relations. Finite-state networks can represent only a subset of all possible languages and relations; that is, only some languages are finite-state languages. Furthermore, this article introduces two types of complex regular expressions that have many linguistic applications, restriction and replacement. Finally, the article discusses the properties of finite-state automata. The three important properties of networks are: that they are epsilon free, deterministic, and minimal. If a network encodes a regular language and if it is epsilon free, deterministic, and minimal, the network is guaranteed to be the best encoding for that language.


Author(s):  
Safiriyu Ijiyemi Eludiora ◽  
O R Ayemonisan

Nigeria official languages are English, Yorùbá, Igbo and Hausa. The focus of the study reported in this paper is to develop learning tool that can assist learners to learn the Yorùbá language using its alphabets. The study is critical to Yorùbá language, because of its endangerment. There is need to introduce different learning tools that can mitigate its extinction. A Yorùbá word perfect system was developed to assist people in learning the Yorùbá language. English and Yorùbá words formation are experimented using computational morphological approach (word formation). The theoretical framework considered Finite state automata (FSA) to realise different ways of combining the consonants and vowels to form word. Two to five letter words were considered. The system was designed and implemented using UML tools and python programming language.The system will teach the users on how the words are formed, and the number of syllables in each word. The user  need not to know how to tone mark word before he/she can use the system. Any word typed will be analysed according to its number of syllables. This approach produces representatives of all parts of speech (POS) of the two languages. It produces corpora for the two languages


2019 ◽  
Author(s):  
Francis M. Tyers ◽  
Jonathan N. Washington ◽  
Darya Kavitskaya ◽  
Memduh Gökırmak

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise. Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and, to our knowledge, is the first biscriptual single morphological transducer for any language.


2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


2021 ◽  
Vol 17 (1) ◽  
pp. 558-564
Author(s):  
Bakhtiyor Mengliyev ◽  
Shohida Shahabitdinova ◽  
Shahlo Khamroeva ◽  
Shakhnoza Gulyamova ◽  
Adiba Botirova

2013 ◽  
Vol 55 (2) ◽  
pp. 109-122
Author(s):  
Tiziana Pontillo

Abstract As Kātyāyana emphasizes while commenting on the ekaśeṣa-rules, words apply per object. Consequently, no word should be capable of conveying more than one object. By contrast not only does paronomasia, the so-called śleṣa, break the one-to-one relation between the śabda- and artha-levels of language; there are also grammatical rules which look like deviations from the naturally expected cause-effect relation between word forms and their meanings. The ekaśeṣa-rule represents one of these exceptions, since some parts of the artha are comprehensible, even without employing the word-form denoting them, such as mātṛ in the dual noun pitarau, meaning ‘mother and father’ rather than ‘the two fathers’. P atañjali already mentions an intriguing option in the use of śabdas, when he notes that a word form can merely convey its primary denotation, such as candra denoting the ‘moon’, or can express something that is ‘like something else’, such as candra conveying the sense of a ‘face like a moon’. These exceptions are reconsidered here within the framework of the “yugapad-expression”, which is how Bhartṛhari defines one of the two language options (the other one being kramaḥ ‘sequence’), an option realised when a single word simultaneously conveys more than one meaning, but an option whose use is discouraged. Technical (ritual and grammatical) speculations on simultaneity as an exception to the bi-unique relationship between a cause and its effect date back to the 2nd to 3rd centuries BC. Nonetheless, grammarians insist on excluding these extreme applications of meaning extension; only the late kāvyālaṃkāraśāstra- authors extol the virtues of the phenomenon. The paper focuses on the trajectory that might have been followed in the intervening changes.


Sign in / Sign up

Export Citation Format

Share Document