scholarly journals Comparison of LDA and SPRT on Clinical Dataset Classifications

2011 ◽  
Vol 4 ◽  
pp. BII.S6935 ◽  
Author(s):  
Chih Lee ◽  
Brittany Nkounkou ◽  
Chun-Hsi Huang

In this work, we investigate the well-known classification algorithm LDA as well as its close relative SPRT. SPRT affords many theoretical advantages over LDA. It allows specification of desired classification error rates α and β and is expected to be faster in predicting the class label of a new instance. However, SPRT is not as widely used as LDA in the pattern recognition and machine learning community. For this reason, we investigate LDA, SPRT and a modified SPRT (MSPRT) empirically using clinical datasets from Parkinson's disease, colon cancer, and breast cancer. We assume the same normality assumption as LDA and propose variants of the two SPRT algorithms based on the order in which the components of an instance are sampled. Leave-one-out cross-validation is used to assess and compare the performance of the methods. The results indicate that two variants, SPRT-ordered and MSPRT-ordered, are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered examine less components than LDA before arriving at a decision. These advantages imply that SPRT-ordered and MSPRT-ordered are the preferred algorithms over LDA when the normality assumption can be justified for a dataset.

2019 ◽  
Vol 60 (6) ◽  
pp. 818-824 ◽  
Author(s):  
Takuya Mizutani ◽  
Taiki Magome ◽  
Hiroshi Igaki ◽  
Akihiro Haga ◽  
Kanabu Nawa ◽  
...  

ABSTRACT The purpose of this study was to predict the survival time of patients with malignant glioma after radiotherapy with high accuracy by considering additional clinical factors and optimize the prescription dose and treatment duration for individual patient by using a machine learning model. A total of 35 patients with malignant glioma were included in this study. The candidate features included 12 clinical features and 192 dose–volume histogram (DVH) features. The appropriate input features and parameters of the support vector machine (SVM) were selected using the genetic algorithm based on Akaike’s information criterion, i.e. clinical, DVH, and both clinical and DVH features. The prediction accuracy of the SVM models was evaluated through a leave-one-out cross-validation test with residual error, which was defined as the absolute difference between the actual and predicted survival times after radiotherapy. Moreover, the influences of various values of prescription dose and treatment duration on the predicted survival time were evaluated. The prediction accuracy was significantly improved with the combined use of clinical and DVH features compared with the separate use of both features (P < 0.01, Wilcoxon signed rank test). Mean ± standard deviation of the leave-one-out cross-validation using the combined clinical and DVH features, only clinical features and only DVH features were 104.7 ± 96.5, 144.2 ± 126.1 and 204.5 ± 186.0 days, respectively. The prediction accuracy could be improved with the combination of clinical and DVH features, and our results show the potential to optimize the treatment strategy for individual patients based on a machine learning model.


2018 ◽  
Vol 31 (1) ◽  
pp. 39-64
Author(s):  
Tetsuya Maeshiro

AbstractThis paper proposes the use of quantitative indicators to evaluate the comedic success of Japanese “Manzai” performances without using semantic processing or time sequence information. The validity of the proposed indicators was verified by predicting the rankings of the final rounds and decision matches of ten M1 Grand Prix, a national-level humor contest in Japan, using leave-one-out cross validation. The results demonstrate that the proposed indicators are able to predict the ranking of Manzai championships as the mean prediction precision was 0.58 (rank correlation) for final rounds, and 0.70 (champion prediction accuracy) for the decision matches.


2018 ◽  
Vol 50 (1) ◽  
pp. 43-59 ◽  
Author(s):  
Alberto Martínez-Salvador ◽  
Carmelo Conesa-García

Abstract Many models have been developed to predict the sediment transport in watercourses. This paper attempts to test the effectiveness of log-linear models (LLM) to estimate the suspended (S-LMM), dissolved (D-LLM), and total suspended (T-LLM) load into a Mediterranean semiarid karst stream (the Argos River basin, in southeast Spain). An assessment of the supposed validity of each model and a leave-one-out cross-validation were carried out to determine their degree of statistical robustness. The T-LLM model showed higher prediction accuracy (R2 = 0.98, RMSE = 0.15, and PE = ±5.4–6.6%) than the D-LLM model (R2 = 0.97, RMSE = 0.16, and PE = ±5.5–6.8%) or the D-LLM model (R2 = 0.77, RMSE = 0.71, and PE = ±101–493%). In addition, different model variants, according to two flow patterns (FP1 = base flow and FP2 = rising water level), were developed. The FP2-SLLM model provided a very good fit (R2 = 0.94, RMSE = 0.34, and PE = ±25.3–61.5%), substantially improving the results of the S-LLM model.


2015 ◽  
Vol 13 (04) ◽  
pp. 1550014 ◽  
Author(s):  
Bo Liao ◽  
Sumei Ding ◽  
Haowen Chen ◽  
Zejun Li ◽  
Lijun Cai

Identifying the microRNA–disease relationship is vital for investigating the pathogenesis of various diseases. However, experimental verification of disease-related microRNAs remains considerable challenge to many researchers, particularly for the fact that numerous new microRNAs are discovered every year. As such, development of computational methods for disease-related microRNA prediction has recently gained eminent attention. In this paper, first, we construct a miRNA functional network and a disease similarity network by integrating different information sources. Then, we further introduce a new diffusion-based method (NDBM) to explore global network similarity for miRNA–disease association inference. Even though known miRNA–disease associations in the database are rare, NDBM still achieves an area under the ROC curve (AUC) of 85.62% in the leave-one-out cross-validation in improving the prediction accuracy of previous methods significantly. Moreover, our method is applicable to diseases with no known related miRNAs as well as new miRNAs with unknown target diseases. Some associations who strongly predicted by our method are confirmed by public databases. These superior performances suggest that NDBM could be an effective and important tool for biomedical research.


2021 ◽  
Vol 7 (2) ◽  
pp. 067-082
Author(s):  
Yousef M. T. El Gimati

Decision Tree (DT) typically splitting criteria using one variable at a time. In this way, the final decision partition has boundaries that are parallel to axes. An observation is misclassified when it falls in a region which does not have the same class membership. Misclassification rate in classification tree is defined as the proportion of observations classified to the wrong class while in the regression tree is defined as a mean squared error. In this paper, we present two of the important methods for estimating the misclassification (error) rate in decision trees, as we know that all classification procedures, including decision trees, can produce errors. Constructed DT model by using a training dataset and tested it based on an independent test dataset. There are several procedures for estimating the error rate of decision tree-structured classifiers, as K-fold cross-validation and bootstrap estimates. This comparison aimed to characterize the performance of the two methods in terms of test error rates based on real datasets. The results indicate that 10-fold cross-validation and bootstrap yield a tree fairly close to the best available measured by tree size.


2020 ◽  
Author(s):  
Manabu Sakamoto

ABSTRACTBite force is an ecologically important biomechanical performance measure is informative in inferring the ecology of extinct taxa. However, biomechanical modelling to estimate bite force is associated with some level of uncertainty. Here, I assess the accuracy of bite force estimates in extinct taxa using a Bayesian phylogenetic prediction model. I first fitted a phylogenetic regression model on a training set comprising extant data. The model predicts bite force from body mass and skull width while accounting for differences owning to biting position. The posterior predictive model has a 93% prediction accuracy as evaluated through leave-one-out cross-validation. I then predicted bite force in 37 species of extinct mammals and archosaurs from the posterior distribution of predictive models.Biomechanically estimated bite forces fall within the posterior predictive distributions for all except four species of extinct taxa, and are thus as accurate as that predicted from body size and skull width, given the variation inherent in extant taxa and the amount of time available for variance to accrue. Biomechanical modelling remains a valuable means to estimate bite force in extinct taxa and should be reliably informative of functional performances and serve to provide insights into past ecologies.


2019 ◽  
Vol 76 (7) ◽  
pp. 2349-2361
Author(s):  
Benjamin Misiuk ◽  
Trevor Bell ◽  
Alec Aitken ◽  
Craig J Brown ◽  
Evan N Edinger

Abstract Species distribution models are commonly used in the marine environment as management tools. The high cost of collecting marine data for modelling makes them finite, especially in remote locations. Underwater image datasets from multiple surveys were leveraged to model the presence–absence and abundance of Arctic soft-shell clam (Mya spp.) to support the management of a local small-scale fishery in Qikiqtarjuaq, Nunavut, Canada. These models were combined to predict Mya abundance, conditional on presence throughout the study area. Results suggested that water depth was the primary environmental factor limiting Mya habitat suitability, yet seabed topography and substrate characteristics influence their abundance within suitable habitat. Ten-fold cross-validation and spatial leave-one-out cross-validation (LOO CV) were used to assess the accuracy of combined predictions and to test whether this was inflated by the spatial autocorrelation of transect sample data. Results demonstrated that four different measures of predictive accuracy were substantially inflated due to spatial autocorrelation, and the spatial LOO CV results were therefore adopted as the best estimates of performance.


Sign in / Sign up

Export Citation Format

Share Document