Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus Within Localities

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.

Download Full-text

Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus within Localities

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101oa02 ◽

2021 ◽

Vol 12 (6) ◽

pp. 0-0

Keyword(s):

Predictive Accuracy ◽

Multinomial Logistic Regression ◽

Geographic Area ◽

Stochastic Gradient ◽

Transmission Mode ◽

Gradient Boosting ◽

Support Vector ◽

Linear Discriminant ◽

Classifier Selection ◽

Stochastic Gradient Boosting

Download Full-text

Predicting 90-Day and 1-Year Mortality in Spinal Metastatic Disease: Development and Internal Validation

Neurosurgery ◽

10.1093/neuros/nyz070 ◽

2019 ◽

Vol 85 (4) ◽

pp. E671-E681 ◽

Cited By ~ 22

Author(s):

Aditya V Karhade ◽

Quirina C B S Thio ◽

Paul T Ogink ◽

Christopher M Bono ◽

Marco L Ferrone ◽

...

Keyword(s):

Metastatic Disease ◽

Web Application ◽

Performance Status ◽

Predictive Performance ◽

Operative Management ◽

Stochastic Gradient ◽

Patient Specific ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

Abstract BACKGROUND Increasing prevalence of metastatic disease has been accompanied by increasing rates of surgical intervention. Current tools have poor to fair predictive performance for intermediate (90-d) and long-term (1-yr) mortality. OBJECTIVE To develop predictive algorithms for spinal metastatic disease at these time points and to provide patient-specific explanations of the predictions generated by these algorithms. METHODS Retrospective review was conducted at 2 large academic medical centers to identify patients undergoing initial operative management for spinal metastatic disease between January 2000 and December 2016. Five models (penalized logistic regression, random forest, stochastic gradient boosting, neural network, and support vector machine) were developed to predict 90-d and 1-yr mortality. RESULTS Overall, 732 patients were identified with 90-d and 1-yr mortality rates of 181 (25.1%) and 385 (54.3%), respectively. The stochastic gradient boosting algorithm had the best performance for 90-d mortality and 1-yr mortality. On global variable importance assessment, albumin, primary tumor histology, and performance status were the 3 most important predictors of 90-d mortality. The final models were incorporated into an open access web application able to provide predictions as well as patient-specific explanations of the results generated by the algorithms. The application can be found at https://sorg-apps.shinyapps.io/spinemetssurvival/ CONCLUSION Preoperative estimation of 90-d and 1-yr mortality was achieved with assessment of more flexible modeling techniques such as machine learning. Integration of these models into applications and patient-centered explanations of predictions represent opportunities for incorporation into healthcare systems as decision tools in the future.

Download Full-text

Predict Health Insurance Cost by using Machine Learning and DNN Regression Models

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8364.0110321 ◽

2021 ◽

Vol 10 (2) ◽

pp. 137-143

Author(s):

Mohamed hanafy ◽

Omar M. A. Mahmoud

Keyword(s):

Machine Learning ◽

Insurance Industry ◽

Additive Model ◽

Policy Formulation ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbors ◽

Stochastic Gradient Boosting ◽

Insurance Cost

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.

Download Full-text

Comparison of machine learning methods in predicting binary and multi-class occupational accident severity

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202099 ◽

2021 ◽

pp. 1-17

Author(s):

Füsun Recal ◽

Tufan Demirel

Keyword(s):

Machine Learning ◽

Multinomial Logistic Regression ◽

Occupational Accidents ◽

Gradient Boosting ◽

Support Vector ◽

Accident Prediction ◽

Stochastic Gradient Boosting ◽

Future Events ◽

Taking Action ◽

Significant Factors

Although Machine Learning (ML) is widely used to examine hidden patterns in complex databases and learn from them to predict future events in many fields, utilization of it for predicting the outcome of occupational accidents is relatively sparse. This study utilized diversified ML algorithms; Multinomial Logistic Regression (MLR), Support Vector Machines (SVM), Single C5.0 Tree (C5), Stochastic Gradient Boosting (SGB), and Neural Network (NN) in classifying the severity of occupational accidents in binary (Fatal/NonFatal) and multi-class (Fatal/Major/Minor) outcomes. Comparison of the performance of models showed Balanced Accuracy to be the best for SVM and SGB methods in 2-Class and SGB in 3-Class. Algorithms performed better at predicting fatal accidents compared to major and minor accidents. Results obtained revealed that, ML unveils factors contributing to severity to better address the corrective actions. Furthermore, taking action related to even some of the most significant factors in complex accidents database with many attributes can prevent majority of severe accidents. Interpretation of most significant factors identified for accident prediction suggest the following corrective measures: taking fall prevention actions, prioritizing workplace inspections based on the number of employees, and supplementing safety actions according to worker’s age and experience.

Download Full-text

Class point approach for software effort estimation using stochastic gradient boosting technique

ACM SIGSOFT Software Engineering Notes ◽

10.1145/2597716.2597725 ◽

2014 ◽

Vol 39 (3) ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Shashank Mouli Satapathy ◽

Barada Prasanna Acharya ◽

Santanu Kumar Rath

Keyword(s):

Stochastic Gradient ◽

Gradient Boosting ◽

Effort Estimation ◽

Software Effort Estimation ◽

Stochastic Gradient Boosting ◽

Boosting Technique ◽

Class Point

Download Full-text

Enhanced prediction of vulnerable Web components using Stochastic Gradient Boosting Trees

International Journal of Web Information Systems ◽

10.1108/ijwis-05-2018-0041 ◽

2019 ◽

Vol 15 (2) ◽

pp. 201-214 ◽

Cited By ~ 1

Author(s):

Mahmoud Elish

Keyword(s):

Web Applications ◽

Prediction Models ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Data Sets ◽

Content Type ◽

Stochastic Gradient Boosting ◽

Security Inspection ◽

Novel Model ◽

Efficient Software

Purpose Effective and efficient software security inspection is crucial as the existence of vulnerabilities represents severe risks to software users. The purpose of this paper is to empirically evaluate the potential application of Stochastic Gradient Boosting Trees (SGBT) as a novel model for enhanced prediction of vulnerable Web components compared to common, popular and recent machine learning models. Design/methodology/approach An empirical study was conducted where the SGBT and 16 other prediction models have been trained, optimized and cross validated using vulnerability data sets from multiple versions of two open-source Web applications written in PHP. The prediction performance of these models have been evaluated and compared based on accuracy, precision, recall and F-measure. Findings The results indicate that the SGBT models offer improved prediction over the other 16 models and thus are more effective and reliable in predicting vulnerable Web components. Originality/value This paper proposed a novel application of SGBT for enhanced prediction of vulnerable Web components and showed its effectiveness.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

Stochastic Gradient Boosting Model for Twitter Spam Detection

Computer Systems Science and Engineering ◽

10.32604/csse.2022.020836 ◽

2022 ◽

Vol 41 (2) ◽

pp. 849-859

Author(s):

K. Kiruthika Devi ◽

G. A. Sathish Kumar

Keyword(s):

Stochastic Gradient ◽

Gradient Boosting ◽

Spam Detection ◽

Stochastic Gradient Boosting

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text