scholarly journals Minimally-Supervised Morphological Segmentation using Adaptor Grammars with Linguistic Priors

Author(s):  
Ramy Eskander ◽  
Cass Lowry ◽  
Sujay Khandagale ◽  
Francesca Callejas ◽  
Judith Klavans ◽  
...  
2016 ◽  
Vol 42 (1) ◽  
pp. 91-120 ◽  
Author(s):  
Teemu Ruokolainen ◽  
Oskar Kohonen ◽  
Kairit Sirts ◽  
Stig-Arne Grönroos ◽  
Mikko Kurimo ◽  
...  

This article presents a comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation. In morphological segmentation, word forms are segmented into morphs, the surface forms of morphemes. In the minimally supervised data-driven learning setting, segmentation models are learned from a small number of manually annotated word forms and a large set of unannotated word forms. In addition to providing a literature survey on published methods, we present an in-depth empirical comparison on three diverse model families, including a detailed error analysis. Based on the literature survey, we conclude that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection. As for which approach has been more successful, both the previous work and the empirical evaluation presented here strongly imply that the current state of the art is yielded by the discriminative boundary detection methodology.


2013 ◽  
Vol 1 ◽  
pp. 255-266 ◽  
Author(s):  
Kairit Sirts ◽  
Sharon Goldwater

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.


1969 ◽  
Vol 8 (02) ◽  
pp. 84-90 ◽  
Author(s):  
A. W. Pratt ◽  
M. Pacak

The system for the identification and subsequent transformation of terminal morphemes in medical English is a part of the information system for processing pathology data which was developed at the National Institutes of Health.The recognition and transformation of terminal morphemes is restricted to classes of adjectivals including the -ING and -ED forms, nominals and homographic adjective/noun forms.The adjective-to-noun and noun-to-noun transforms consist basically of a set of substitutions of adjectival and certain nominal suffixes by a set of suffixes which indicate the corresponding nominal form(s).The adjectival/nominal suffix has a polymorphosyntactic transformational function if it has the property of being transformed into more than one nominalizing suffix (e.g., the adjectival suffix -IC can be substituted by a set of nominalizing suffixes -Ø, -A, -E, -Y, -IS, -IA, -ICS): the adjectival suffix has a monomorphosyntactic transformational property if there is only one admissible transform (e.g., -CIC → -X).The morphological segmentation and the subsequent transformations are based on the following principles:a. The word form is segmented according to the principle of »double consonant cut,« i.e., terminal characters following the last set of double consonants are analyzed and treated as a potential suffix. For practical purposes only such terminal suffixes of a maximum length of four have been analyzed.b. The principle that the largest segment of a word form common to both adjective and noun or to both noun stems is retained as a word base for transformational operations, and the non-identical segment is considered to be a »suffix.«The backward right-to-left character search is initiated by the identification of the terminal grapheme of the given word form and is extended to certain admissible sequences of immediately preceding graphemes.The nodes which represent fixed sequences of graphemes are labeled according to their recognition and/or transformation properties.The tree nodes are divided into two groups:a. productive or activatedb. non-productive or non-activatedThe productive (activated) nodes are sequences of sets of graphemes which possess certain properties, such as the indication about part-of-speech class membership, the transformation properties, or both. The non-productive (non-activated) nodes have the function of connectors, i.e., they specify the admissible path to the productive nodes.The computer program for the identification and transformation of the terminal morphemes is open-ended and is already operational. It will be extended to other sub-fields of medicine in the near future.


2021 ◽  
Vol 9 (3) ◽  
pp. 311
Author(s):  
Ben R. Evans ◽  
Iris Möller ◽  
Tom Spencer

Salt marshes are important coastal environments and provide multiple benefits to society. They are considered to be declining in extent globally, including on the UK east coast. The dynamics and characteristics of interior parts of salt marsh systems are spatially variable and can fundamentally affect biotic distributions and the way in which the landscape delivers ecosystem services. It is therefore important to understand, and be able to predict, how these landscape configurations may evolve over time and where the greatest dynamism will occur. This study estimates morphodynamic changes in salt marsh areas for a regional domain over a multi-decadal timescale. We demonstrate at a landscape scale that relationships exist between the topology and morphology of a salt marsh and changes in its condition over time. We present an inherently scalable satellite-derived measure of change in marsh platform integrity that allows the monitoring of changes in marsh condition. We then demonstrate that easily derived geospatial and morphometric parameters can be used to determine the probability of marsh degradation. We draw comparisons with previous work conducted on the east coast of the USA, finding differences in marsh responses according to their position within the wider coastal system between the two regions, but relatively consistent in relation to the within-marsh situation. We describe the sub-pixel-scale marsh morphometry using a morphological segmentation algorithm applied to 25 cm-resolution maps of vegetated marsh surface. We also find strong relationships between morphometric indices and change in marsh platform integrity which allow for the inference of past dynamism but also suggest that current morphology may be predictive of future change. We thus provide insight into the factors governing marsh degradation that will assist the anticipation of adverse changes to the attributes and functions of these critical coastal environments and inform ongoing ecogeomorphic modelling developments.


2007 ◽  
Vol 25 (1) ◽  
pp. 34-39 ◽  
Author(s):  
Marco A.G. de Carvalho ◽  
Roberto de A. Lotufo ◽  
Michel Couprie

2008 ◽  
Vol 12 (1) ◽  
pp. 57-71
Author(s):  
George Hewitt

AbstractProtases ('if'-clauses) in the North West Caucasian language Abkhaz are mostly marked by either /-r/ or /-zα.r/, depending on the tense and/or type of verb (Stative or Dynamic) concerned. The article presents examples of this conditional usage and the role of protasis-type forms in both temporal and interrogative expressions as well as in complementiser-function. The complementisers in question share the semantic feature of irrealis with conditionals. A rhotic element is also found in the non-finite form of the Future I tense, in the Masdar (verbal noun), and in such converbs as the Purposives, the Resultative and the Future Absolute. The article attempts to link the semantic notions of futurity, potentiality, indefiniteness or general irrealis to the rhotic element and asks what might have been the historical development resulting in the forms attested today and thus their original morphological segmentation.


Sign in / Sign up

Export Citation Format

Share Document