scholarly journals Machine Learning Algorithms with Different Gene Expression Datasets

Two classification techniques have been compared using microarray dataset. For example SVM, Logistic Regression strategies have been utilized in the process. The usefulness of these techniques has been determined with precision, accuracy, recall and F1-Scores. These techniques Here Prostate tumors, Lung Cancer, were analyzed. For each situation these strategies were applied to two distinctive microarray datasets with two classes. Finally performance analysis was done

2019 ◽  
pp. 1-19
Author(s):  
Gabriel A. Brooks ◽  
Savannah L. Bergquist ◽  
Mary Beth Landrum ◽  
Sherri Rose ◽  
Nancy L. Keating

PURPOSE Cancer stage is a key determinant of outcomes; however, stage is not available in claims-based data sources used for real-world evaluations. We compare multiple methods for classifying lung cancer stage from claims data. METHODS Our study used the linked SEER-Medicare data. The patient samples included fee-for-service Medicare beneficiaries diagnosed with lung cancer from 2010 to 2011 (development cohort) and 2012 to 2013 (validation cohort) who received chemotherapy. Classification algorithms considered Medicare Part A and B claims for care in the 3 months before and after chemotherapy initiation. We developed a clinical algorithm to predict stage IV ( v I to III) cancer on the basis of treatment patterns (surgery, radiotherapy, chemotherapy). We also considered an ensemble of claims-based machine learning algorithms. Classification methods were trained in the development cohort, and performance was measured in both cohorts. The SEER data were the gold standard for cancer stage. RESULTS Development and validation cohorts included 14,760 and 14,620 patients with lung cancer, respectively. Validation analyses assessed clinical, random forest, and simple logistic regression algorithms. The best performing classifier within the development cohort was the random forests, but this performance was not replicated in validation analysis. Logistic regression had stable performance across cohorts. Compared with the clinical algorithm, the 14-variable logistic regression algorithm demonstrated higher accuracy in both the development (77% v 71%) and validation cohorts (77% v 73%), with improved specificity for stage IV disease. CONCLUSION Machine learning algorithms have potential to improve lung cancer stage classification but may be prone to overfitting. Use of ensembles, cross-validation, and external validation can aid generalizability. Degradation of accuracy between development and validation cohorts suggests the need for caution in implementing machine learning in research or care delivery.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2019 ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.


Sign in / Sign up

Export Citation Format

Share Document