An Ensemble Model of Machine Learning for Primary Tumor Prognosis and Prediction

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

An Optimized Stacking Ensemble Model for Phishing Websites Detection

Electronics ◽

10.3390/electronics10111285 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1285

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Zeyad Ghaleb Al-Mekhlafi ◽

Badiea Abdulkarem Mohammed ◽

Tawfik Al-Hadhrami ◽

...

Keyword(s):

Machine Learning ◽

Random Forests ◽

Ensemble Method ◽

Detection Methods ◽

Detection Accuracy ◽

Ensemble Model ◽

Security Attacks ◽

Data Set ◽

Machine Learning Methods ◽

Ensemble Machine Learning

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.

Download Full-text

Supervised Machine Learning based Ensemble Model for Accurate Prediction of Type 2 Diabetes

2019 SoutheastCon ◽

10.1109/southeastcon42311.2019.9020358 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ramya Akula ◽

Ni Nguyen ◽

Ivan Garibay

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Accurate Prediction ◽

Supervised Machine Learning ◽

Ensemble Model

Download Full-text

SURG-02. SURVIVAL PREDICTION AFTER NEUROSURGICAL RESECTION OF BRAIN METASTASES: A MACHINE LEARNING APPROACH

Neuro-Oncology ◽

10.1093/neuonc/noaa215.849 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii203-ii203

Author(s):

Alexander Hulsbergen ◽

Yu Tung Lo ◽

Vasileios Kavouridis ◽

John Phillips ◽

Timothy Smith ◽

...

Keyword(s):

Machine Learning ◽

Brain Metastases ◽

External Validation ◽

Superior Performance ◽

Prognostic Models ◽

Receiver Operating Curve ◽

Gradient Boosting ◽

Survival Prediction ◽

Ensemble Model ◽

Adaptive Boosting

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.

Download Full-text

Using a Guided Machine Learning Ensemble Model to Predict Discharge Disposition following Meningioma Resection

Journal of Neurological Surgery Part B Skull Base ◽

10.1055/s-0037-1604393 ◽

2017 ◽

Vol 79 (02) ◽

pp. 123-130 ◽

Cited By ~ 4

Author(s):

Whitney Muhlestein ◽

Dallin Akagi ◽

Justiss Kallos ◽

Peter Morone ◽

Kyle Weaver ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Patient Outcomes ◽

Predictive Power ◽

Univariate Analysis ◽

Area Under The Curve ◽

Motor Deficit ◽

Ensemble Model ◽

Algorithm Selection ◽

Discharge Disposition

Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Flood Susceptibility Modeling in a Subtropical Humid Low-Relief Alluvial Plain Environment: Application of Novel Ensemble Machine Learning Approach

Frontiers in Earth Science ◽

10.3389/feart.2021.659296 ◽

2021 ◽

Vol 9 ◽

Author(s):

Manish Pandey ◽

Aman Arora ◽

Alireza Arabameri ◽

Romulus Costache ◽

Naveen Kumar ◽

...

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Classification And Regression Tree ◽

Ground Subsidence ◽

Ensemble Model ◽

Ganga Plain ◽

Humid Climate ◽

Area Index ◽

Flood Susceptibility ◽

Middle Ganga Plain

This study has developed a new ensemble model and tested another ensemble model for flood susceptibility mapping in the Middle Ganga Plain (MGP). The results of these two models have been quantitatively compared for performance analysis in zoning flood susceptible areas of low altitudinal range, humid subtropical fluvial floodplain environment of the Middle Ganga Plain (MGP). This part of the MGP, which is in the central Ganga River Basin (GRB), is experiencing worse floods in the changing climatic scenario causing an increased level of loss of life and property. The MGP experiencing monsoonal subtropical humid climate, active tectonics induced ground subsidence, increasing population, and shifting landuse/landcover trends and pattern, is the best natural laboratory to test all the susceptibility prediction genre of models to achieve the choice of best performing model with the constant number of input parameters for this type of topoclimatic environmental setting. This will help in achieving the goal of model universality, i.e., finding out the best performing susceptibility prediction model for this type of topoclimatic setting with the similar number and type of input variables. Based on the highly accurate flood inventory and using 12 flood predictors (FPs) (selected using field experience of the study area and literature survey), two machine learning (ML) ensemble models developed by bagging frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART), CART-FR and CART-EBF, were applied for flood susceptibility zonation mapping. Flood and non-flood points randomly generated using flood inventory have been apportioned in 70:30 ratio for training and validation of the ensembles. Based on the evaluation performance using threshold-independent evaluation statistic, area under receiver operating characteristic (AUROC) curve, 14 threshold-dependent evaluation metrices, and seed cell area index (SCAI) meant for assessing different aspects of ensembles, the study suggests that CART-EBF (AUCSR = 0.843; AUCPR = 0.819) was a better performant than CART-FR (AUCSR = 0.828; AUCPR = 0.802). The variability in performances of these novel-advanced ensembles and their comparison with results of other published models espouse the need of testing these as well as other genres of susceptibility models in other topoclimatic environments also. Results of this study are important for natural hazard managers and can be used to compute the damages through risk analysis.

Download Full-text

Predicting lung adenocarcinoma disease progression using methylation-correlated blocks and ensemble machine learning classifiers

PeerJ ◽

10.7717/peerj.10884 ◽

2021 ◽

Vol 9 ◽

pp. e10884

Author(s):

Xin Yu ◽

Qian Yang ◽

Dong Wang ◽

Zhaoyang Li ◽

Nianhang Chen ◽

...

Keyword(s):

Machine Learning ◽

Lung Adenocarcinoma ◽

Cox Regression ◽

Characteristic Curve ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Survival Prediction ◽

Ensemble Model ◽

Training Set ◽

Cpg Sites

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.

Download Full-text

Performance Assessment of Ensemble Learning Model for Prediction of Cardiac Disease Among Smokers Based on HRV Features

International Journal of Biomedical and Clinical Engineering ◽

10.4018/ijbce.2021010102 ◽

2021 ◽

Vol 10 (1) ◽

pp. 19-34

Author(s):

S. R. Rathod ◽

C. Y. Patil

Keyword(s):

Machine Learning ◽

Heart Rate ◽

Cardiac Disease ◽

Kappa Statistics ◽

Cardiac Diseases ◽

Ensemble Model ◽

Single Model ◽

Machine Learning Methods ◽

Ensemble Machine Learning ◽

Boosting Technique

Smoking impacts the pattern of heart rate variability (HRV); HRV therefore acts as a predictor of cardiac diseases (CD). In this study, to predict CD non-invasively among smokers, ensemble machine learning methods have been used. A single model is created based on ensemble voting classifier with a combined boosting technique to improve the accuracy of predictive model. The final ensemble model shows an accuracy of 95.20%, precision of 97.27%, sensitivity of 92.35%, specificity of 98.07%, F1 score of 0.95, AUC of 0.961, MCE of 0.0479, kappa statistics value of 0.9041, and MSE of 0.2189. The obtained accuracy by using the proposed method is the highest value achieved so far for the prediction of CD among smokers using HRV data.

Download Full-text

An Ensemble Model for Predicting Chronic Diseases Using Machine Learning Algorithms

Smart Computing Techniques and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-1502-3_34 ◽

2021 ◽

pp. 337-345

Author(s):

B. Manjulatha ◽

Suresh Pabboju

Keyword(s):

Machine Learning ◽

Chronic Diseases ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Ensemble Model

Download Full-text