scholarly journals Inflectional defaults and principal parts: An empirical investigation

Author(s):  
Dunstan Brown ◽  
Roger Evans

We describe an empirical method to explore and contrast the roles of default and principal part information in the differentiation of inflectional classes. We use an unsupervised machine learning method to classify Russian nouns into inflectional classes, first with full paradigm information, and then with particular types of information removed. When we remove default information, shared across classes, we expect there to be little effect on the classification. In contrast when we remove principal part information we expect there to be a more detrimental effect on classification performance. Our data set consists of paradigm listings of the 80 most frequent Russian nouns, generated from a formal theory which allows us to distinguish default and principal part information. Our results show that removal of forms classified as principal parts has a more detrimental effect on the classification than removal of default information. However, we also find that there are differences within the defaults and principal parts, and we suggest that these may in part be attributable to stress patterns.

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Matteo Pellegrini

AbstractThis paper provides a fully word-based, abstractive analysis of predictability in Latin verb paradigms. After reviewing previous traditional and theoretically grounded accounts of Latin verb inflection, a procedure is outlined where the uncertainty in guessing the content of paradigm cells given knowledge of one or more inflected wordforms is measured by means of the information-theoretic notions of unary and n-ary implicative entropy, respectively, in a quantitative approach that uses the type frequency of alternation patterns between wordforms as an estimate of their probability of application. Entropy computations are performed by using the Qumin toolkit on data taken from the inflected lexicon LatInfLexi. Unary entropy values are used to draw a mapping of the verbal paradigm in zones of full interpredictability, composed of cells that can be inferred from one another with no uncertainty. N-ary entropy values are used to extract categorical and near principal part sets, that allow to fill the rest of the paradigm with little or no uncertainty. Lastly, the issue of the impact of information on the derivational relatedness of lexemes on uncertainty in inflectional predictions is tackled, showing that adding a classification of verbs in derivational families allows for a relevant reduction of entropy, not only for derived verbs, but also for simple ones.


2016 ◽  
Vol 5 (4) ◽  
pp. 1
Author(s):  
Bander Al-Zahrani

The paper gives a description of estimation for the reliability function of weighted Weibull distribution. The maximum likelihood estimators for the unknown parameters are obtained. Nonparametric methods such as empirical method, kernel density estimator and a modified shrinkage estimator are provided. The Markov chain Monte Carlo method is used to compute the Bayes estimators assuming gamma and Jeffrey priors. The performance of the maximum likelihood, nonparametric methods and Bayesian estimators is assessed through a real data set.


Author(s):  
Timothy C. Allison ◽  
J. Jeffrey Moore

The effectiveness of fatigue and life prediction methods depends heavily on accurate knowledge of the static and dynamic stresses acting on a structure. Although stress fields may be calculated from the finite element shape functions if a finite element model is constructed and analyzed, in many cases the cost of constructing and analyzing a finite element model is prohibitive. Modeling errors can severely affect the accuracy of stress simulations. This paper presents an empirical method for predicting a transient dynamic stress response of a structure based on measured load and strain data that can be collected during vibration tests. The method applies the proper orthogonal decomposition to a measured data set to filter noise and reduce the size of the identification problem and then employs a matrix deconvolution technique to decouple and identify the reduced coordinate impulse response functions for the structure. The method is applied to simulation data from an axial compressor blade model and produces accurate stress predictions compared to finite element results.


MATEMATIKA ◽  
2020 ◽  
Vol 36 (1) ◽  
pp. 43-49
Author(s):  
T Dwi Ary Widhianingsih ◽  
Heri Kuswanto ◽  
Dedy Dwi Prastyo

Logistic regression is one of the commonly used classification methods. It has some advantages, specifically related to hypothesis testing and its objective function. However, it also has some disadvantages in the case of high-dimensional data, such as multicolinearity, over-fitting, and a high computational burden. Ensemblebased classification methods have been proposed to overcome these problems. The logistic regression ensemble (LORENS) method is expected to improve the classification performance of basic logistic regression. In this paper, we apply it to the case of drug discovery with the objective of obtaining candidate compounds to protect the normal non-cancerous cells, which is considered to be a problem with a data-set of high dimensionality. The experimental results show that it performs well, with an accuracy of 69% and AUC of 0.7306.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


2019 ◽  
Vol 30 (3) ◽  
pp. 18-37
Author(s):  
Tawei Wang ◽  
Yen-Yao Wang ◽  
Ju-Chun Yen

This article investigates the transfer of information security breach information between breached firms and their peers. Using a large data set of information security incidents from 2003 to 2013, the results suggest that 1) the effect of information security breach information transfer exists between breached firms and non-breached firms that offer similar products and 2) the effect of information transfer is weaker when the information security breach is due to internal faults or is related to the loss of personally identifiable information. Additional tests demonstrate that the effect of information transfer exhibits consistent patterns across time and with different types of information security breaches. Finally, the effect does not depend on whether the firms are IT intensive. Implications, limitations, and future research are discussed.


2020 ◽  
Vol 12 (6) ◽  
pp. 1015 ◽  
Author(s):  
Kan Zeng ◽  
Yixiao Wang

Classification algorithms for automatically detecting sea surface oil spills from spaceborne Synthetic Aperture Radars (SARs) can usually be regarded as part of a three-step processing framework, which briefly includes image segmentation, feature extraction, and target classification. A Deep Convolutional Neural Network (DCNN), named the Oil Spill Convolutional Network (OSCNet), is proposed in this paper for SAR oil spill detection, which can do the latter two steps of the three-step processing framework. Based on VGG-16, the OSCNet is obtained by designing the architecture and adjusting hyperparameters with the data set of SAR dark patches. With the help of the big data set containing more than 20,000 SAR dark patches and data augmentation, the OSCNet can have as many as 12 weight layers. It is a relatively deep Deep Learning (DL) network for SAR oil spill detection. It is shown by the experiments based on the same data set that the classification performance of OSCNet has been significantly improved compared to that of traditional machine learning (ML). The accuracy, recall, and precision are improved from 92.50%, 81.40%, and 80.95% to 94.01%, 83.51%, and 85.70%, respectively. An important reason for this improvement is that the distinguishability of the features learned by OSCNet itself from the data set is significantly higher than that of the hand-crafted features needed by traditional ML algorithms. In addition, experiments show that data augmentation plays an important role in avoiding over-fitting and hence improves the classification performance. OSCNet has also been compared with other DL classifiers for SAR oil spill detection. Due to the huge differences in the data sets, only their similarities and differences are discussed at the principle level.


2019 ◽  
Vol 23 (2) ◽  
pp. 200-213 ◽  
Author(s):  
Petra A. Nylund ◽  
Nuria Arimany-Serrat ◽  
Xavier Ferras-Hernandez ◽  
Eric Viardot ◽  
Henry Boateng ◽  
...  

Purpose Successful innovation requires a significant financial commitment. Therefore, the purpose of this paper is to investigate the relation between internal and external financing and the degree of innovation in European firms. Design/methodology/approach An empirical investigation is carried out using a longitudinal data set including 146 large, quoted, European firms over ten years, resulting in 1,460 firm years. Findings The authors find that only firms in the energy sector will be more innovative when they are profitable. For the sectors of basic materials, manufacture and construction, services, financial and property services, and technology and telecommunications, profitability is negatively related to innovation. External financing in the form of debt reduces the focus on innovation in profitable firms. Research limitations/implications The authors analyze the findings through the lens of evolutionary economics. The model is not valid for firms in the consumer-goods sector, which indicates a need for adapting the model to each sector. We conclude that the impact of profitability on innovation varies across sectors, with debt financing as a moderating factor. Originality/value To the best of authors’ knowledge, this is the first study that analyzes the internal and external financing and the degree of innovation in European firms on a longitudinal basis.


Author(s):  
Sharon E. Nicholson

Environmental constraints have large impacts on populations, especially in semi-arid regions such as Africa. Climate and weather have long affected African societies, but unfortunately the traditional climatic record for the continent is relatively short. For that reason, historical information has often been used to reconstruct climate of the past. Sources of historical information include reports and diaries of explorers, settlers, and missionaries; government records; reports of scientific expeditions; and historical geographical and meteorological journals. Local oral tradition is also useful. It is reported in the form of historical chronicles compiled centuries later. References to famine and drought, economic conditions, floods, agriculture, weather events, and the season cycle are examples of useful types of information. Some of the records also include meteorological measurements. More recently chemical and biological information, generally derived from lake cores, has been applied to historical climate reconstruction. Early works provided in most cases qualitative, discontinuous information, such as drought chronologies. However, a statistical method of climate reconstruction applied to a vast collection of historical information and meteorological data allowed for the creation of a two-century, semi-quantitative “precipitation” data set. It consists of annual indices related to rainfall since 1800 for ninety regions of the African continent. This data set has served to illustrate several 19th-century periods of anomalous rainfall conditions that affected nearly the entire continent. An example is widespread aridity during several decades early in that century.


Sign in / Sign up

Export Citation Format

Share Document