scholarly journals Manipuri Morphological Analysis

2020 ◽  
Vol 9 (2) ◽  
pp. 4-10
Author(s):  
Y. Bablu Singh ◽  
Th. Mamata Devi ◽  
Ch. Yashawanta Singh

Morphological analysis is the basic foundation in Natural Language Processing applications including Syntax Parsing, Machine Translation (MT), Information Retrieval (IR) and Automatic Indexing. Morphological Analysis can provide valuable information for computer based linguistics task such as Lemmatization and studies of internal structure of the words or the feature values of the word. Computational Morphology is the application of morphological rules in the field of Computational Linguistics, and it is the emerging area in AI, which studies the structure of words, which are formed by combining smaller units of linguistics information, called morphemes: the building blocks of words. It provides about Semantic and Syntactic role in a sentence. It can analyze the Manipuri word forms and produces grammatical information, which is associated with the lexicon. Morphological Analyzer for Manipuri language has been tested on 4500 Manipuri lexicons in Shakti Standard Format (SSF) using Meitei Mayek Unicode as source; thereby an accuracy of 84% has been obtained on a manual check.

2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


2021 ◽  
Vol 33 (4) ◽  
pp. 117-130
Author(s):  
Alexander Sergeevich Sapin

Morphological analysis of text is one of the most important stages of natural language processing (NLP). Traditional and well-studied problems of morphological analysis include normalization (lemmatization) of a given word form, recognition of its morphological characteristics and their morphological disambiguation. The morphological analysis also involves the problem of morpheme segmentation of words (i.e., segmentation of words into constituent morphs and their classification), which is actual in some NLP applications. In recent years, several machine learning models have been developed, which increase the accuracy of traditional morphological analysis and morpheme segmentation, but performance of such models is insufficient for many applied problems. For morpheme segmentation, high-precision models have been built only for lemmas (normalized word forms). This paper describes two new high-accuracy neural network models that implement morphemic segmentation of Russian word forms with sufficiently high performance. The first model is based on convolutional neural networks and shows the state-of-the-art quality of morphemic segmentation for Russian word forms. The second model, besides morpheme segmentation of a word form, preliminarily refines its morphological characteristics, thereby performing their disambiguation. The performance of this joined morphological model is the best among the considered morpheme segmentation models, with comparable accuracy of segmentation.


2015 ◽  
pp. 20 ◽  
Author(s):  
Stig-Arne Grönroos ◽  
Kristiina Jokinen ◽  
Katri Hiovain ◽  
Mikko Kurimo ◽  
Sami Virpioja

Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statistical model for morphological segmentation of North Sámi language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfessor Baseline and FlatCat methods. Aer annotating 237 words with our active learning setup, we improve morph boundary recall over 20% with no loss of precision.


Author(s):  
Jeongkyu Lee

There has been a great deal of interest in the development of ontology to facilitate knowledge sharing and database integration. In general, ontology is a set of terms or vocabularies of interest in a particular information domain, and shows the relationships among them (Doerr, Hunter, & Lagoze, 2003). It includes machine-interpretable definitions of basic concepts in the domain. Ontology is very popular in the fields of natural language processing (NLP) and Web user interface (Web ontology). To take this advantage into multimedia content analysis, several studies have proposed ontology-based schemes (Hollink & Worring, 2005; Spyropoulos, Paliouras, Karkaletsis, Kosmopoulos, Pratikakis, Perantonis, & Gatos, 2005). Modular structure of the ontology methodology is used in a generic analysis scheme to semantically interpret and annotate multimedia content. This methodology consists of domain ontology, core ontology, and multimedia ontology. Domain ontology captures concepts in a particular type of domain, while core ontology is the key building blocks necessary to enable the scalable assimilation of information from diverse sources. Multimedia ontology is used to model multimedia data, such as audio, image, and video. In the multimedia data analysis the meaningful patterns and hidden knowledge are discovered from the database. There are existing tools for managing and searching the discovered patterns and knowledge. However, almost all of the approaches use low-level feature values instead of high-level perceptions, which make a huge gap between machine interpretation and human understanding. For example, if we have to retrieve anomaly from video surveillance systems, low-level feature values cannot represent such semantic meanings. In order to address the problem, the main focus of research has been on the construction and utilization of ontology for specific data domain in various applications. In this chapter, we first survey the state-of-the-art in multimedia ontology, specifically video ontology, and then investigate the methods of automatic generation of video ontology.


2016 ◽  
Vol 4 ◽  
pp. 47-72
Author(s):  
Stig-Arne Grönroos ◽  
Katri Hiovain ◽  
Peter Smit ◽  
Ilona Rauhala ◽  
Kristiina Jokinen ◽  
...  

Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation with a large unannotated corpus and a small amount of annotated word forms selected using an active learning approach. We apply the procedure to two Finno-Ugric languages: Finnish and North Sámi. The semi-supervised Morfessor FlatCat method is used for statistical learning. For Finnish, we set up a simulated scenario to test various active learning query strategies. The best performance is provided by a coverage-based strategy on word initial and final substrings. For North Sámi we collect a set of humanannotated data. With 300 words annotated with our active learning setup, we see a relative improvement in morph boundary F1-score of 19% compared to unsupervised learning and 7.8% compared to random selection.


1999 ◽  
Vol 5 (1) ◽  
pp. 95-112 ◽  
Author(s):  
THOMAS BUB ◽  
JOHANNES SCHWINN

Verbmobil represents a new generation of speech-to-speech translation systems in which spontaneously spoken language, speaker independence and adaptability as well as the combination of deep and shallow approaches to the analysis and transfer problems are the main features. The project brought together researchers from the fields of signal processing, computational linguistics and artificial intelligence. Verbmobil goes beyond the state-of-the-art in each of these areas, but its main achievement is the seamless integration of them. The first project phase (1993–1996) has been followed up by the second project phase (1997–2000), which aims at applying the results to further languages and at integrating innovative telecooperation techniques. Quite apart from the speech and language processing issues, the size and complexity of the project represent an extreme challenge on the areas of project management and software engineering:[bull ] 50 researchers from 29 organizations at different sites in different countries are involved in the software development process,[bull ] to reuse existing software, hardware, knowledge and experience, only a few technical restrictions could be given to the partners.In this article we describe the Verbmobil prototype system from a software-engineering perspective. We discuss:[bull ] the modularized functional architecture,[bull ] the flexible and extensible software architecture which reflects that functional architecture,[bull ] the evolutionary process of system integration,[bull ] the communication-based organizational structure of the project,[bull ] the evaluation of the system operational by the end of the first project phase.


Author(s):  
Solmaz Zakhireh ◽  
Yadollah Omidi ◽  
Younes Beygi-Khosrowshahi ◽  
Ayoub Aghanejad ◽  
Jaleh Barar ◽  
...  

Recently, pollen grains (PGs) have been introduced as drug carriers and scaffolding building blocks. This study aimed to assess the in-vitro biocompatibility of Pistacia vera L. hollow PGs/Fe3O4 nanoparticles (HPGs/Fe3O4NPs) composites using human adipose-derived mesenchymal stem cells (hAD-MSCs). In this regard, iron oxide nanoparticles (Fe3O4NPs) were assembled on the surface of HPGs at different concentrations. The biocompatibility of the prepared composites was assessed through MTT assay, apoptosis-related gene expression and field emission scanning electron microscopy (FE-SEM) analysis. Compared to the bare HPGs, the HPGs/Fe3O4NPs exhibited a biphasic impact on hAD-MSCs. The composite containing 1% Fe3O4NPs demonstrated no cytotoxicity up to 21 days while higher Fe3O4NPs contents and long-term exposure revealed adverse effects on the hAD-MSCs’ growth. The obtained result was verified by the qRT-PCR and morphological analysis carried out through FE-SEM which suggests that a narrow region below 1% Fe3O4NPs may be the optimum choice for medicinal applications of HPGs/Fe3O4NPs microdevices.


Author(s):  
Mans Hulden

Finite-state machines—automata and transducers—are ubiquitous in natural-language processing and computational linguistics. This chapter introduces the fundamentals of finite-state automata and transducers, both probabilistic and non-probabilistic, illustrating the technology with example applications and common usage. It also covers the construction of transducers, which correspond to regular relations, and automata, which correspond to regular languages. The technologies introduced are widely employed in natural language processing, computational phonology and morphology in particular, and this is illustrated through common practical use cases.


Sign in / Sign up

Export Citation Format

Share Document