Predicting 90-Day and 1-Year Mortality in Spinal Metastatic Disease: Development and Internal Validation

Abstract BACKGROUND Increasing prevalence of metastatic disease has been accompanied by increasing rates of surgical intervention. Current tools have poor to fair predictive performance for intermediate (90-d) and long-term (1-yr) mortality. OBJECTIVE To develop predictive algorithms for spinal metastatic disease at these time points and to provide patient-specific explanations of the predictions generated by these algorithms. METHODS Retrospective review was conducted at 2 large academic medical centers to identify patients undergoing initial operative management for spinal metastatic disease between January 2000 and December 2016. Five models (penalized logistic regression, random forest, stochastic gradient boosting, neural network, and support vector machine) were developed to predict 90-d and 1-yr mortality. RESULTS Overall, 732 patients were identified with 90-d and 1-yr mortality rates of 181 (25.1%) and 385 (54.3%), respectively. The stochastic gradient boosting algorithm had the best performance for 90-d mortality and 1-yr mortality. On global variable importance assessment, albumin, primary tumor histology, and performance status were the 3 most important predictors of 90-d mortality. The final models were incorporated into an open access web application able to provide predictions as well as patient-specific explanations of the results generated by the algorithms. The application can be found at https://sorg-apps.shinyapps.io/spinemetssurvival/ CONCLUSION Preoperative estimation of 90-d and 1-yr mortality was achieved with assessment of more flexible modeling techniques such as machine learning. Integration of these models into applications and patient-centered explanations of predictions represent opportunities for incorporation into healthcare systems as decision tools in the future.

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text

Predict Health Insurance Cost by using Machine Learning and DNN Regression Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8364.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 137-143

Author(s):

Mohamed hanafy ◽

Omar M. A. Mahmoud

Keyword(s):

Machine Learning ◽

Insurance Industry ◽

Additive Model ◽

Policy Formulation ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Stochastic Gradient Boosting ◽

Insurance Cost

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.

Download Full-text

Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus Within Localities

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa1 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-12

Author(s):

Donald Douglas Atsa'am ◽

Ruth Wario

Keyword(s):

Predictive Accuracy ◽

Multinomial Logistic Regression ◽

Geographic Area ◽

Stochastic Gradient ◽

Transmission Mode ◽

Gradient Boosting ◽

Support Vector ◽

Linear Discriminant ◽

Classifier Selection ◽

Stochastic Gradient Boosting

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.

Download Full-text

Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus within Localities

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101oa02 ◽

2021 ◽

Vol 12 (6) ◽

pp. 0-0

Keyword(s):

Predictive Accuracy ◽

Multinomial Logistic Regression ◽

Geographic Area ◽

Stochastic Gradient ◽

Transmission Mode ◽

Gradient Boosting ◽

Support Vector ◽

Linear Discriminant ◽

Classifier Selection ◽

Stochastic Gradient Boosting

Download Full-text

Class point approach for software effort estimation using stochastic gradient boosting technique

ACM SIGSOFT Software Engineering Notes ◽

10.1145/2597716.2597725 ◽

2014 ◽

Vol 39 (3) ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Shashank Mouli Satapathy ◽

Barada Prasanna Acharya ◽

Santanu Kumar Rath

Keyword(s):

Stochastic Gradient ◽

Gradient Boosting ◽

Effort Estimation ◽

Software Effort Estimation ◽

Stochastic Gradient Boosting ◽

Boosting Technique ◽

Class Point

Download Full-text

Enhanced prediction of vulnerable Web components using Stochastic Gradient Boosting Trees

International Journal of Web Information Systems ◽

10.1108/ijwis-05-2018-0041 ◽

2019 ◽

Vol 15 (2) ◽

pp. 201-214 ◽

Cited By ~ 1

Author(s):

Mahmoud Elish

Keyword(s):

Web Applications ◽

Prediction Models ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Data Sets ◽

Content Type ◽

Stochastic Gradient Boosting ◽

Security Inspection ◽

Novel Model ◽

Efficient Software

Purpose Effective and efficient software security inspection is crucial as the existence of vulnerabilities represents severe risks to software users. The purpose of this paper is to empirically evaluate the potential application of Stochastic Gradient Boosting Trees (SGBT) as a novel model for enhanced prediction of vulnerable Web components compared to common, popular and recent machine learning models. Design/methodology/approach An empirical study was conducted where the SGBT and 16 other prediction models have been trained, optimized and cross validated using vulnerability data sets from multiple versions of two open-source Web applications written in PHP. The prediction performance of these models have been evaluated and compared based on accuracy, precision, recall and F-measure. Findings The results indicate that the SGBT models offer improved prediction over the other 16 models and thus are more effective and reliable in predicting vulnerable Web components. Originality/value This paper proposed a novel application of SGBT for enhanced prediction of vulnerable Web components and showed its effectiveness.

Download Full-text

Stochastic Gradient Boosting Model for Twitter Spam Detection

Computer Systems Science and Engineering ◽

10.32604/csse.2022.020836 ◽

2022 ◽

Vol 41 (2) ◽

pp. 849-859

Author(s):

K. Kiruthika Devi ◽

G. A. Sathish Kumar

Keyword(s):

Stochastic Gradient ◽

Gradient Boosting ◽

Spam Detection ◽

Stochastic Gradient Boosting

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

Feasibility of Stochastic Gradient Boosting Approach for Evaluating Seismic Liquefaction Potential Based on SPT and CPT Case Histories

Journal of Performance of Constructed Facilities ◽

10.1061/(asce)cf.1943-5509.0001292 ◽

2019 ◽

Vol 33 (3) ◽

pp. 04019024 ◽

Cited By ~ 44

Author(s):

Jian Zhou ◽

Enming Li ◽

Mingzheng Wang ◽

Xin Chen ◽

Xiuzhi Shi ◽

...

Keyword(s):

Stochastic Gradient ◽

Gradient Boosting ◽

Case Histories ◽

Liquefaction Potential ◽

Seismic Liquefaction ◽

Stochastic Gradient Boosting

Download Full-text

A Comparative Study of Different Machine Learning Algorithms in Predicting the Content of Ilmenite in Titanium Placer

Applied Sciences ◽

10.3390/app10020635 ◽

2020 ◽

Vol 10 (2) ◽

pp. 635 ◽

Cited By ~ 5

Author(s):

Yingli LV ◽

Qui-Thao Le ◽

Hoang-Bac Bui ◽

Xuan-Nam Bui ◽

Hoang Nguyen ◽

...

Keyword(s):

Soft Computing ◽

Mean Squared Error ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Soft Computing Techniques ◽

Residuals Analysis ◽

Computing Models

In this study, the ilmenite content in beach placer sand was estimated using seven soft computing techniques, namely random forest (RF), artificial neural network (ANN), k-nearest neighbors (kNN), cubist, support vector machine (SVM), stochastic gradient boosting (SGB), and classification and regression tree (CART). The 405 beach placer borehole samples were collected from Southern Suoi Nhum deposit, Binh Thuan province, Vietnam, to test the feasibility of these soft computing techniques in estimating ilmenite content. Heavy mineral analysis indicated that valuable minerals in the placer sand are zircon, ilmenite, leucoxene, rutile, anatase, and monazite. In this study, five materials, namely rutile, anatase, leucoxene, zircon, and monazite, were used as the input variables to estimate ilmenite content based on the above mentioned soft computing models. Of the whole dataset, 325 samples were used to build the regarded soft computing models; 80 remaining samples were used for the models’ verification. Root-mean-squared error (RMSE), determination coefficient (R2), a simple ranking method, and residuals analysis technique were used as the statistical criteria for assessing the model performances. The numerical experiments revealed that soft computing techniques are capable of estimating the content of ilmenite with high accuracy. The residuals analysis also indicated that the SGB model was the most suitable for determining the ilmenite content in the context of this research.

Download Full-text