scholarly journals Comparison of Three Statistical Classification Techniques for Maser Identification

Author(s):  
Ellen M. Manning ◽  
Barbara R. Holland ◽  
Simon P. Ellingsen ◽  
Shari L. Breen ◽  
Xi Chen ◽  
...  

AbstractWe applied three statistical classification techniques—linear discriminant analysis (LDA), logistic regression, and random forests—to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific examples investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions; further astronomical observations will enable the accuracy of these predictions to be tested.

2018 ◽  
Vol 15 (1) ◽  
pp. 98-107
Author(s):  
R Lestawati ◽  
Rais Rais ◽  
I T Utami

Classification is one of statistical methods in grouping the data compiled systematically. The classification of an object can be done by two approaches, namely classification methods parametric and non-parametric methods. Non-parametric methods is used in this study is the method of CART to be compared to the classification result of the logistic regression as one of a parametric method. From accuracy classification table of CART method to classify the status of DHF patient into category of severe and non-severe exactly 76.3%, whereas the percentage of truth logistic regression was 76.7%, CART method to classify the status of DHF patient into categories of severe and non-severe exactly 76.3%, CART method yielded 4 significant variables that hepatomegaly, epitaksis, melena and diarrhea as well as the classification is divided into several segmens into a more accurate whereas the logistic regression produces only 1 significant variables that hepatomegaly


2016 ◽  
Vol 13 (3) ◽  
pp. 35-46 ◽  
Author(s):  
A. Blanco-Oliver ◽  
A. Irimia-Dieguez ◽  
M.D. Oliver-Alfonso ◽  
M.J. Vázquez-Cueto

Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods (Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic variables complement financial ratios for bankruptcy prediction


1989 ◽  
Vol 48 (2) ◽  
pp. 331-339 ◽  
Author(s):  
D. A. Elston ◽  
C. A. Glasbey ◽  
D. R. Neilson

ABSTRACTLactation curves are fitted to data as a preliminary to estimating summary statistics. Two widely quoted curves are atbe-ct (Wood, 1967) and a(1 - e-bt) - ct (Cobby and Le Du, 1978), each of which has three parameters. Restriction to either of these curves imposes limitations on the fit to the data and can result in biased estimation of summary statistics. Alternatively, lactation curves can be generated by the use of a non-parametric method which requires only weak assumptions about the signs of derivatives of the curves. Because the non-parametric curves are more flexible, estimates of summary statistics are less likely to be biased than those based on parametric models. Use of the non-parametric curves is particularly advantageous around the time of peak yield, where the curves of Wood and Cobby and Le Du are known to fit data poorly.


Author(s):  
Hans H. Diebner ◽  
Nina Timmesfeld

Based on comprehensible non-parametric methods, estimates of crucial parameters that characterise the COVID-19 pandemic with a focus on the German epidemic are presented. Where appropriate, the estimates for Germany are compared with the results for six other countries (FR, IT, US, UK, ES, CH) to get an idea of the breadth of applicability and a relational understanding. Thereby, only prevalence data of daily reported new counts of diagnosed cases and fatalities provided by the ECDC are used. Where appropriate, the results are compared with conclusions drawn from using the dataset provided by the RKI. Drawing on uncertain a priori knowledge is avoided. Specifically, we present estimates for the duration from diagnosis to death being 13 days for Germany and about 2 days for Italy as the extremes. Furthermore, based on the knowledge of this time lag between diagnoses and deaths, properly delayed asymptotic as well as instantaneous fatality-case ratios are calculated having superiority compared to the commonly published case-fatality rate. The median of the time series of the instantaneous fatality-case ratio with proper delay of 13-days between cases and deaths for Germany turns out to be 0.024. Asymptotic values are presented for other countries with France ranking highest with a fatality-case ratio of almost 0.2 at its peak. The basic reproduction number, R_0, for Germany is estimated to be between 2.4 and 3.4. The uncertainty stems from uncertain knowledge of the generation time. A delay autocorrelation shows resonances at about 4 days and 7 days, where the latter resonance is at least partially attributable to the sampling process with weekly periodicity. The calculation of the basic reproduction number is based on an evaluation of cumulative numbers of cases yielding time-dependent doubling times as an intermediate step. This allows to infer to the reproduction number during the early phase of onset of the epidemic. In a second approach, the instantaneous basic reproduction number is derived from the incident (counts of new) cases and allows, in contrast to the first version, to infer to the temporal behaviour of the reproduction number during the later epidemic course. To conclude, by avoiding complicated parametric models we provide insights into basic features of the COVID-19 epidemic in an utmost transparent and comprehensible way.


2018 ◽  
Vol 57 (3) ◽  
pp. 525-534 ◽  
Author(s):  
Bryson C. Bates ◽  
Andrew J. Dowdy ◽  
Richard E. Chandler

AbstractLightning is a natural hazard that can lead to the ignition of wildfires, disruption and damage to power and telecommunication infrastructures, human and livestock injuries and fatalities, and disruption to airport activities. This paper examines the ability of six statistical and machine-learning classification techniques to distinguish between nonlightning and lightning days at the coarse spatial and temporal scales of current general circulation models and reanalyses. The classification techniques considered were 1) a combination of principal component analysis and logistic regression, 2) classification and regression trees, 3) random forests, 4) linear discriminant analysis, 5) quadratic discriminant analysis, and 6) logistic regression. Lightning-flash counts at six locations across Australia for 2004–13 were used, together with atmospheric variables from the ERA-Interim dataset. Tenfold cross validation was used to evaluate classification performance. It was found that logistic regression was superior to the other classifiers considered and that its prediction skill is much better than using climatological values. The sets of atmospheric variables included in the final logistic-regression models were primarily composed of spatial mean measures of instability and lifting potential, along with atmospheric water content. The memberships of these sets varied among climatic zones.


Author(s):  
Dong Gi Seo ◽  
Younyoung Choi ◽  
Sun Huh

Purpose: The dimensionality of examinations provides empirical evidence of the internal test structure underlying the responses to a set of items. In turn, the internal structure is an important piece of evidence of the validity of an examination. Thus, the aim of this study was to investigate the performance of the DETECT program and to use it to examine the internal structure of the Korean nursing licensing examination. Methods: Non-parametric methods of dimensional testing, such as the DETECT program, have been proposed as ways of overcoming the limitations of traditional parametric methods. A non-parametric method (the DETECT program) was investigated using simulation data under several conditions and applied to the Korean nursing licensing examination. Results: The DETECT program performed well in terms of determining the number of underlying dimensions under several different conditions in the simulated data. Further, the DETECT program correctly revealed the internal structure of the Korean nursing licensing examination, meaning that it detected the proper number of dimensions and appropriately clustered the items within each dimension.Conclusion: The DETECT program performed well in detecting the number of dimensions and in assigning items for each dimension. This result implies that the DETECT method can be useful for examining the internal structure of assessments, such as licensing examinations, that possess relatively many domains and content areas.


2001 ◽  
Vol 78 (3) ◽  
pp. 303-316 ◽  
Author(s):  
P. TILQUIN ◽  
W. COPPIETERS ◽  
J. M. ELSEN ◽  
F. LANTIER ◽  
C. MORENO ◽  
...  

Most QTL mapping methods assume that phenotypes follow a normal distribution, but many phenotypes of interest are not normally distributed, e.g. bacteria counts (or colony-forming units, CFU). Such data are extremely skewed to the right and can present a high amount of zero values, which are ties from a statistical point of view. Our objective is therefore to assess the efficiency of four QTL mapping methods applied to bacteria counts: (1) least-squares (LS) analysis, (2) maximum-likelihood (ML) analysis, (3) non-parametric (NP) mapping and (4) nested ANOVA (AN). A transformation based on quantiles is used to mimic observed distributions of bacteria counts. Single positions (1 marker, 1 QTL) as well as chromosome scans (11 markers, 1 QTL) are simulated. When compared with the analysis of a normally distributed phenotype, the analysis of raw bacteria counts leads to a strong decrease in power for parametric methods, but no decrease is observed for NP. However, when a mathematical transformation (MT) is applied to bacteria counts prior to analysis, parametric methods have the same power as NP. Furthermore, parametric methods, when coupled with MT, outperform NP when bacteria counts have a very high proportion of zeros (70·8%). Our results show that the loss of power is mainly explained by the asymmetry of the phenotypic distribution, for parametric methods, and by the existence of ties, for the non-parametric method. Therefore, mapping of QTL for bacterial diseases, as well as for other diseases assessed by a counting process, should focus on the occurrence of ties in phenotypes before choosing the appropriate QTL mapping method.


2021 ◽  
Vol 13 (19) ◽  
pp. 3872
Author(s):  
Jianlai Chen ◽  
Hanwen Yu ◽  
Gang Xu ◽  
Junchao Zhang ◽  
Buge Liang ◽  
...  

Existing airborne SAR autofocus methods can be classified as parametric and non-parametric. Generally, non-parametric methods, such as the widely used phase gradient autofocus (PGA) algorithm, are only suitable for scenes with many dominant point targets, while the parametric ones are suitable for all types of scenes, in theory, but their efficiency is generally low. In practice, whether many dominant point targets are present in the scene is usually unknown, so determining what kind of algorithm should be selected is not straightforward. To solve this issue, this article proposes an airborne SAR autofocus approach combined with blurry imagery classification to improve the autofocus efficiency for ensuring autofocus precision. In this approach, we embed the blurry imagery classification based on a typical VGGNet in a deep learning community into the traditional autofocus framework as a preprocessing step before autofocus processing to analyze whether dominant point targets are present in the scene. If many dominant point targets are present in the scene, the non-parametric method is used for autofocus processing. Otherwise, the parametric one is adopted. Therefore, the advantage of the proposed approach is the automatic batch processing of all kinds of airborne measured data.


Sign in / Sign up

Export Citation Format

Share Document