scholarly journals Comparison of Predicted Probabilities of Proportional Hazards Regression and Linear Discriminant Analysis Methods Using a Colorectal Cancer Molecular Biomarker Database

2007 ◽  
Vol 3 ◽  
pp. 117693510700300 ◽  
Author(s):  
Sreelatha Meleth ◽  
Chakrapani Chatla ◽  
Venkat R. Katkoori ◽  
Billie Anderson ◽  
James M. Hardin ◽  
...  

Background Although a majority of studies in cancer biomarker discovery claim to use proportional hazards regression (PHREG) to the study the ability of a biomarker to predict survival, few studies use the predicted probabilities obtained from the model to test the quality of the model. In this paper, we compared the quality of predictions by a PHREG model to that of a linear discriminant analysis (LDA) in both training and test set settings. Methods The PHREG and LDA models were built on a 491 colorectal cancer (CRC) patient dataset comprised of demographic and clinicopathologic variables, and phenotypic expression of p53 and Bcl-2. Two variable selection methods, stepwise discriminant analysis and the backward selection, were used to identify the final models. The endpoint of prediction in these models was five-year post-surgery survival. We also used linear regression model to examine the effect of bin size in the training set on the accuracy of prediction in the test set. Results The two variable selection techniques resulted in different models when stage was included in the list of variables available for selection. However, the proportion of survivors and non-survivors correctly identified was identical in both of these models. When stage was excluded from the variable list, the error rate for the LDA model was 42% as compared to an error rate of 34% for the PHREG model. Conclusions This study suggests that a PHREG model can perform as well or better than a traditional classifier such as LDA to classify patients into prognostic classes. Also, this study suggests that in the absence of the tumor stage as a variable, Bcl-2 expression is a strong prognostic molecular marker of CRC.

2020 ◽  
Vol 15 ◽  
Author(s):  
Mohanad Mohammed ◽  
Henry Mwambi ◽  
Bernard Omolo

Background: Colorectal cancer (CRC) is the third most common cancer among women and men in the USA, and recent studies have shown an increasing incidence in less developed regions, including Sub-Saharan Africa (SSA). We developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for the mutation status and survival of CRC patients. Methods: Publicly-available microarray and RNASeq data from 54 matched formalin-fixed paraffin-embedded (FFPE) samples from the Affymetrix GeneChip and RNASeq platforms, were used to obtain differentially expressed genes between mutant and wild-type samples. We applied the support-vector machines, artificial neural networks, random forests, k-nearest neighbor, naïve Bayes, negative binomial linear discriminant analysis, and the Poisson linear discriminant analysis algorithms for classification. Cox proportional hazards model was used for survival analysis. Results: Compared to the genelist from each of the individual platforms, the hybrid genelist had the highest accuracy, sensitivity, specificity, and AUC for mutation status, across all the classifiers and is prognostic for survival in patients with CRC. NBLDA method was the best performer on the RNASeq data while the SVM method was the most suitable classifier for CRC across the two data types. Nine genes were found to be predictive of survival. Conclusion: This signature could be useful in clinical practice, especially for colorectal cancer diagnosis and therapy. Future studies should determine the effectiveness of integration in cancer survival analysis and the application on unbalanced data, where the classes are of different sizes, as well as on data with multiple classes.


2020 ◽  
Vol 8 (9) ◽  
pp. 358-367
Author(s):  
O. Akangoziri ◽  
C. N. Okoli

This study examined comparison of the Multiple logistic regression, Linear discriminant analysis and Quadratic discriminant in estimating the infant birth outcome and misclassification error rate of birth outcomes with factors of infant mortality in Anambra State, Nigeria. The birth outcomes of interest were the Neonatal death, Still birth and Alive. Secondary source of data were obtained from the records department of General Hospital Onitsha from 2007-2016. The data comprises of Status of infant birth, Mothers parity, Age of mother, Weight of baby, Mothers Education Status, Number of Bookings before gestation and Gestation Age. The data analysis is performed using R-software. The result of the findings from the multiple logistic regression showed that Mothers Education Status (MES) and Booking contributed significantly on the logistic model while factors of Parity, Sex, Age of Mother (AOM), Year, GA and Birth Weight (BW) were found to be insignificant on birth outcomes. Also observed that the misclassification error rate for birth outcome for the said approach is found to be 0.5992 (59.92%). More so, findings of the study equally showed that the prior probabilities of the groups for the linear and quadratic discriminant analysis were 0.228503, 0.40168 and 0.36981 for Alive, Neonatal death and Still birth respectively. Further findings revealed that the Mothers Education Status and Bookings Status have the greatest impact for first and second linear function respectively. In addition, the result of the misclassification error rate for birth outcome using the linear discriminant analysis is 0.5931 (59.31%). The misclassification error rate for birth outcome based on   quadratic discriminant analysis is 0.5956 (59.56%). Based on the findings of this study, linear discriminant approach is the best alternative in estimating misclassification error rate of infant birth outcome followed by quadratic discriminant analysis and the least is multiple logistic regression. The findings clearly confirmed that the linear discriminant analysis is the best with misclassification error rate of 59.31%.


2016 ◽  
Vol 63 (4) ◽  
pp. 449-463
Author(s):  
Sergiusz Herman

Classification is an algorithm, which assigns studied companies, taking into consideration their attributes, to specific population. An essential part of it is classifier. Its measure of quality is especially predictability, measured by true error rate. The value of this error, due to lack of sufficiently large and independent test set, must be estimated on the basis of available learning set.The aim of this article is to make a review and compare selected methods for estimating the prediction error of classifier, constructed with linear discriminant analysis. It was examined if the results of the analysis depends on the sample size and the method of selecting variables for a model. Empirical research was made on example of problem of bankruptcy prediction of join-stock companies in Poland.


2014 ◽  
Vol 6 (22) ◽  
pp. 9037-9044 ◽  
Author(s):  
Meilan Ouyang ◽  
Zhimin Zhang ◽  
Chen Chen ◽  
Xinbo Liu ◽  
Yizeng Liang

A new method performs classification and variable selection simultaneously to analyze complicated metabolomics datasets.


Sign in / Sign up

Export Citation Format

Share Document