Comparative Analysis of Selected Methods for Estimating the Prediction Error of Classifie

2016 ◽  
Vol 63 (4) ◽  
pp. 449-463
Author(s):  
Sergiusz Herman

Classification is an algorithm, which assigns studied companies, taking into consideration their attributes, to specific population. An essential part of it is classifier. Its measure of quality is especially predictability, measured by true error rate. The value of this error, due to lack of sufficiently large and independent test set, must be estimated on the basis of available learning set.The aim of this article is to make a review and compare selected methods for estimating the prediction error of classifier, constructed with linear discriminant analysis. It was examined if the results of the analysis depends on the sample size and the method of selecting variables for a model. Empirical research was made on example of problem of bankruptcy prediction of join-stock companies in Poland.

2007 ◽  
Vol 3 ◽  
pp. 117693510700300 ◽  
Author(s):  
Sreelatha Meleth ◽  
Chakrapani Chatla ◽  
Venkat R. Katkoori ◽  
Billie Anderson ◽  
James M. Hardin ◽  
...  

Background Although a majority of studies in cancer biomarker discovery claim to use proportional hazards regression (PHREG) to the study the ability of a biomarker to predict survival, few studies use the predicted probabilities obtained from the model to test the quality of the model. In this paper, we compared the quality of predictions by a PHREG model to that of a linear discriminant analysis (LDA) in both training and test set settings. Methods The PHREG and LDA models were built on a 491 colorectal cancer (CRC) patient dataset comprised of demographic and clinicopathologic variables, and phenotypic expression of p53 and Bcl-2. Two variable selection methods, stepwise discriminant analysis and the backward selection, were used to identify the final models. The endpoint of prediction in these models was five-year post-surgery survival. We also used linear regression model to examine the effect of bin size in the training set on the accuracy of prediction in the test set. Results The two variable selection techniques resulted in different models when stage was included in the list of variables available for selection. However, the proportion of survivors and non-survivors correctly identified was identical in both of these models. When stage was excluded from the variable list, the error rate for the LDA model was 42% as compared to an error rate of 34% for the PHREG model. Conclusions This study suggests that a PHREG model can perform as well or better than a traditional classifier such as LDA to classify patients into prognostic classes. Also, this study suggests that in the absence of the tumor stage as a variable, Bcl-2 expression is a strong prognostic molecular marker of CRC.


Forecasting ◽  
2020 ◽  
Vol 2 (4) ◽  
pp. 505-525
Author(s):  
Angeliki Papana ◽  
Anastasia Spyridou

Financial bankruptcy prediction is an essential issue in emerging economies taking into consideration the economic upheaval that can be caused by business failures. The research on bankruptcy prediction is of the utmost importance as it aims to build statistical models that can distinguish healthy firms from financially distressed ones. This paper explores the applicability of the four most used approaches to predict financial bankruptcy using data concerning the case of Greece. A comparison of linear discriminant analysis, logit, decision trees and neural networks is performed. The results show that discriminant analysis is slightly superior to the other methods.


2020 ◽  
Vol 8 (9) ◽  
pp. 358-367
Author(s):  
O. Akangoziri ◽  
C. N. Okoli

This study examined comparison of the Multiple logistic regression, Linear discriminant analysis and Quadratic discriminant in estimating the infant birth outcome and misclassification error rate of birth outcomes with factors of infant mortality in Anambra State, Nigeria. The birth outcomes of interest were the Neonatal death, Still birth and Alive. Secondary source of data were obtained from the records department of General Hospital Onitsha from 2007-2016. The data comprises of Status of infant birth, Mothers parity, Age of mother, Weight of baby, Mothers Education Status, Number of Bookings before gestation and Gestation Age. The data analysis is performed using R-software. The result of the findings from the multiple logistic regression showed that Mothers Education Status (MES) and Booking contributed significantly on the logistic model while factors of Parity, Sex, Age of Mother (AOM), Year, GA and Birth Weight (BW) were found to be insignificant on birth outcomes. Also observed that the misclassification error rate for birth outcome for the said approach is found to be 0.5992 (59.92%). More so, findings of the study equally showed that the prior probabilities of the groups for the linear and quadratic discriminant analysis were 0.228503, 0.40168 and 0.36981 for Alive, Neonatal death and Still birth respectively. Further findings revealed that the Mothers Education Status and Bookings Status have the greatest impact for first and second linear function respectively. In addition, the result of the misclassification error rate for birth outcome using the linear discriminant analysis is 0.5931 (59.31%). The misclassification error rate for birth outcome based on   quadratic discriminant analysis is 0.5956 (59.56%). Based on the findings of this study, linear discriminant approach is the best alternative in estimating misclassification error rate of infant birth outcome followed by quadratic discriminant analysis and the least is multiple logistic regression. The findings clearly confirmed that the linear discriminant analysis is the best with misclassification error rate of 59.31%.


2020 ◽  
Vol 4 ◽  
pp. 239784732097125
Author(s):  
Chirag N Patel ◽  
Sivakumar Prasanth Kumar ◽  
Rakesh M Rawal ◽  
Manishkumar B Thaker ◽  
Himanshu A Pandya

Background: Bioinformatics and statistical analysis have been employed to develop a classification model to distinguish toxic and non-toxic molecules. Aims: The primary objective of this study is to enumerate the cut-off values of various physico-chemical (ligand-centric) and target interaction (receptor-centric) descriptors which forms the basis for classifying cardiotoxic and non-toxic molecules. We also sought correlation of molecular docking, absorption, distribution, metabolism, excretion, and toxicology (ADMET) parameters, Lipinski rules, physico-chemical parameters, etc. of human cardiotoxicity drugs. Methods: A training and test set of 91 compounds were applied to linear discriminant analysis (LDA) using 2D and 3D descriptors as discriminating variables representing various molecular modeling parameters to identify which function of descriptor type is responsible for cardiotoxicity. Internal validation was performed using the leave-one-out cross-validation methodology ensuing in good results, assuring the stability of the discriminant function (DF). Results: The values of the statistical parameters Fisher Discriminant Analysis (FDA) and Wilk’s λ for the DF showed reliable statistical significance, as long as the success rate in the prediction for both the training and the test set attained more than 93% accuracy, 87.50% sensitivity and 94.74% specificity. Conclusion: The predictive model was built using a hybrid approach using organ-specific targets for docking and ADMET properties for the FDA (Food and Drug Administration) approved and withdrawn drugs. Classifiers were developed by linear discriminant analysis and the cut-off was enumerated by receiver operating characteristic curve (ROC) analysis to achieve reliable specificity and sensitivity.


Ekonomika ◽  
2014 ◽  
Vol 93 (2) ◽  
pp. 131-146 ◽  
Author(s):  
Nicoleta Bărbuţă-Mişu ◽  
Elena-Silvia Codreanu

Abstract. In this study, the bankruptcy risk of the companies acting in the Romanian building sector was evaluated. The main purpose of this paper is to present, using the scoring method, the classification of enterprises according to their financial performance into both successful and bankrupt companies, To achieve this goal, we used two well-known models: Conan & Holder, and Altman. Based on financial data for the period 2008–2012, we performed a comparative analysis of bankruptcy risk and noted that the same company could be classified differently by these two models. The results may constitute a landmark for Romanian companies in substantiating decisions and in order to analyze the financial failure from at least two perspectives.Key words: scoring method, failure risk, bankruptcy prediction, financial ratios, discriminant analysis


Author(s):  
Pekka Siirtola ◽  
Juha Röning

AbstractThis study introduces an ensemble-based personalized human activity recognition method relying on incremental learning, which is a method for continuous learning, that can not only learn from streaming data but also adapt to different contexts and changes in context. This adaptation is based on a novel weighting approach which gives bigger weight to those base models of the ensemble which are the most suitable to the current context. In this article, contexts are different body positions for inertial sensors. The experiments are performed in two scenarios: (S1) adapting model to a known context, and (S2) adapting model to a previously unknown context. In both scenarios, the models had to also adapt to the data of previously unknown person, as the initial user-independent dataset did not include any data from the studied user. In the experiments, the proposed ensemble-based approach is compared to non-weighted personalization method relying on ensemble-based classifier and to static user-independent model. Both ensemble models are experimented using three different base classifiers (linear discriminant analysis, quadratic discriminant analysis, and classification and regression tree). The results show that the proposed ensemble method performs much better than non-weighted ensemble model for personalization in both scenarios no matter which base classifier is used. Moreover, the proposed method outperforms user-independent models. In scenario 1, the error rate of balanced accuracy using user-independent model was 13.3%, using non-weighted personalization method 13.8%, and using the proposed method 6.4%. The difference is even bigger in scenario 2, where the error rate using user-independent model is 36.6%, using non-weighted personalization method 36.9%, and using the proposed method 14.1%. In addition, F1 scores also show that the proposed method performs much better in both scenarios that the rival methods. Moreover, as a side result, it was noted that the presented method can also be used to recognize body position of the sensor.


Sign in / Sign up

Export Citation Format

Share Document