morphological parsing
Recently Published Documents


TOTAL DOCUMENTS

31
(FIVE YEARS 6)

H-INDEX

9
(FIVE YEARS 1)

2020 ◽  
Vol 15 (2) ◽  
pp. 295-329
Author(s):  
Harald Clahsen ◽  
Anna Jessen

Abstract This study examines the processing of morphologically complex words focusing on how morphological (in addition to orthographic and semantic) factors affect bilingual word recognition. We report findings from a large experimental study with groups of bilingual (Turkish/German) speakers using the visual masked-priming technique. We found morphologically mediated effects on the response speed and the inter-individual variability within the bilingual participant group. We conclude that the grammar (qua morphological parsing) not only enhances speed of processing in bilingual language processing but also yields more uniform performance and thereby constrains variability within a group of otherwise heterogeneous individuals.


2020 ◽  
pp. 1-63
Author(s):  
Amrith Krishna ◽  
Bishal Santra ◽  
Ashim Gupta ◽  
Pavankumar Satuluri ◽  
Pawan Goyal

We propose a framework using Energy Based Models for multiple structured prediction tasks in Sanskrit. Ours is an arc-factored model, similar to the graph-based parsing approaches, and we consider the tasks of word segmentation, morphological parsing, dependency parsing, syntactic linearization, and prosodification, a prosody-level task we introduce in this work. Ours is a search-based structured prediction framework, which expects a graph as input, where relevant linguistic information is encoded in the nodes, and the edges are then used to indicate the association between these nodes. Typically, the state of the art models for morphosyntactic tasks in morphologically rich languages still rely on hand-crafted features for their performance. But here, we automate the learning of the feature function. The feature function so learned, along with the search space we construct, encode relevant linguistic information for the tasks we consider. This enables us to substantially reduce the training data requirements to as low as 10 % as compared to the data requirements for the neural state of the art models. Our experiments in Czech and Sanskrit show the language-agnostic nature of the framework, where we train highly competitive models for both the languages. Moreover, our framework enables us to incorporate languagespecific constraints to prune the search space and to filter the candidates during inference. We obtain significant improvements in morphosyntactic tasks for Sanskrit by incorporating language-specific constraints into the model. In all the tasks we discuss for Sanskrit, we either achieve state of the art results or ours is the only data-driven solution for those tasks.


Author(s):  
Юлия Мазурова ◽  
Yuliya Mazurova

The project aims at documenting the Indo-Aryan Kullui language. This unwritten minor language of the Himachali Pahari group is common in Kullu district in Himachal Pradesh (India). The core objective of the project is to describe Kullui through field methods in linguistics. The study includes the collection of sociolinguistic information, grammar description, compi­ling a dictionary, as well as recording texts. The project run by A.S. Krylova, Yu.V. Mazurova, E.A. Renkovskaya, E.M. Shuvannikova (Knyazeva) is carried out under the auspices of the Institute of Linguistics, the Russian Academy of Sciences. During the linguistic expeditions to Himachal Pradesh, the researchers collected insights into the sociolinguistic situation: types of multilingualism, areas of use of the main languages in the region — Kullui, Hindi, English, etc.; studied the language features of different generations. The research shows the dynamics of the linguistic situation: in the older generation, especially, among women, there are cases of monolingual language — knowledge of the local language alone; the middle generation, as a rule, speaks several languages — Hindi, one or several local languages/dialects, some also know English (its Indian version). In addition, some people know Punjabi, Urdu, Nepali to some extent, which is associated with the geographic location and history of Himachal Pradesh. Young people and schoolchildren demonstrate the onset of a language shift: many people know Himachali languages and use them in everyday life; however, the use of minor languages is gradually restricted to communication with older family members. The key communication tool for the younger generation is Hindi, for educated youth — both Hindi and English. A detailed description of minor languages is relevant right now, while they still retain their authenticity and used by all generations. It is crucial to document them using new modern technologies (voiced dictionaries, gloss texts with audio and video recordings). Existing as a means of oral communication, Kullui and other minor languages of Himachal Pradesh currently lack standardized writing. The research team has developed a phonological transcription based on the International Phonetic Alphabet (IPA) for recording oral speech, as well as morphological parsing for grammatical material.


Author(s):  
Jocelyn Pender

The increased availability of digital floras and the application of optical character recognition (OCR) to digitized texts has resulted in exciting opportunities for flora data mining. For example, the software package CharaParser has been developed for the semantic annotation of morphological descriptions from taxonomic treatments (Cui 2012). However, after digitization and OCR processing and before parsing of morphological treatments can begin, content types must be annotated (i.e., text represents names, morphology, discussion or distribution). In addition to enabling morphological parsing, content type annotation also facilitates content search and data linkage. For example, by annotating pieces of a floral treatment, assertions from various floras of the same type can be combined into a single document (i.e., a "mash-up" floral treatment). Several products and pipelines have been developed for the semantic annotation, or mark-up, of taxonomic documents (e.g., GoldenGATE, FlorML; Sautter et al. 2012, Hamann et al. 2014). However, these products lack a combination of both ease of implementation (e.g., the ability to run as a script in a programmatic workflow) and the use of modern parsing methods, such as text mining and Natural Language Processing (NLP) approaches. Here I present a pilot project implementing text mining and NLP approaches to marking-up floras implemented in Python. I will describe the success of the project, and summarize lessons learned, especially in relation to previous flora markup projects. Annotation of existing flora documents is an essential step towards building next-generation floras (i.e., mash-ups and enhanced floras as platforms) and enables automated trait extraction. Building an easy-to-use access point to modern text mining and NLP techniques for botanical literature will allow for more flexible and responsive flora annotation, and is an important step towards realizing botanical data integration goals.


2018 ◽  
Vol 3 (1) ◽  
pp. 54
Author(s):  
Lindy B Comstock

The phenomenon of “suffix interference” has been used as evidence for a distinction between inflectional and derivational processes (e.g. Pinker & Prince, 1988; Pinker, 1999; Pinker & Ullman, 2002). Yet much of the work on affix priming exists in English, a morphologically poor language, and suffix interference appears inconsistently in cross-linguistic data. The greater reliance on morphological complexity in Russian, and its use of an infinitival suffix and aspectual affixes that may bridge the distinction between traditional definitions of inflectional and derivational word forms, call into question how generalizable the original findings on suffix interference may be for morphologically-complex languages. Investigating these questions, this paper provides unexpected findings: suffix interference is absent in Russian, inflectional suffixes reveal significantly more robust priming effects, and the infinitival suffix is best considered a special case of affix priming, failing to pattern with either inflectional or derivational suffixes. Thus, Russian appears to defy the assumption that inflections are “stripped” during morphological parsing; instead, verbal inflections prove the greatest facilitators of morphological priming. A linear mixed effects model indicates these effects cannot be explained by frequency alone.


Author(s):  
Uuganbaatar D ◽  
Guanglai Gao ◽  
Byambasuren I ◽  
Nergui B

This study compiles primarily the word structure of Modern Mongolian language and further more focused on the possibilities of description of Mongolian language in PC KIMMO, a two level processing method of morphological parsing. The rules file and lexicon presented in the paper describe the morphology of Mongolian words. A lexicon containing the root words of contemporary Mongolian is used in the testing. As a result the two-level morphology is determined as completely possible to be used for Mongolian linguistics. In addition PC-KIMMO description of traditional Mongolian script is considered as being possible.


Sign in / Sign up

Export Citation Format

Share Document