scholarly journals Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus Within Localities

Author(s):  
Donald Douglas Atsa'am ◽  
Ruth Wario

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.


Neurosurgery ◽  
2019 ◽  
Vol 85 (4) ◽  
pp. E671-E681 ◽  
Author(s):  
Aditya V Karhade ◽  
Quirina C B S Thio ◽  
Paul T Ogink ◽  
Christopher M Bono ◽  
Marco L Ferrone ◽  
...  

Abstract BACKGROUND Increasing prevalence of metastatic disease has been accompanied by increasing rates of surgical intervention. Current tools have poor to fair predictive performance for intermediate (90-d) and long-term (1-yr) mortality. OBJECTIVE To develop predictive algorithms for spinal metastatic disease at these time points and to provide patient-specific explanations of the predictions generated by these algorithms. METHODS Retrospective review was conducted at 2 large academic medical centers to identify patients undergoing initial operative management for spinal metastatic disease between January 2000 and December 2016. Five models (penalized logistic regression, random forest, stochastic gradient boosting, neural network, and support vector machine) were developed to predict 90-d and 1-yr mortality. RESULTS Overall, 732 patients were identified with 90-d and 1-yr mortality rates of 181 (25.1%) and 385 (54.3%), respectively. The stochastic gradient boosting algorithm had the best performance for 90-d mortality and 1-yr mortality. On global variable importance assessment, albumin, primary tumor histology, and performance status were the 3 most important predictors of 90-d mortality. The final models were incorporated into an open access web application able to provide predictions as well as patient-specific explanations of the results generated by the algorithms. The application can be found at https://sorg-apps.shinyapps.io/spinemetssurvival/ CONCLUSION Preoperative estimation of 90-d and 1-yr mortality was achieved with assessment of more flexible modeling techniques such as machine learning. Integration of these models into applications and patient-centered explanations of predictions represent opportunities for incorporation into healthcare systems as decision tools in the future.


Author(s):  
Mohamed hanafy ◽  
Omar M. A. Mahmoud

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.


2021 ◽  
pp. 1-17
Author(s):  
Füsun Recal ◽  
Tufan Demirel

Although Machine Learning (ML) is widely used to examine hidden patterns in complex databases and learn from them to predict future events in many fields, utilization of it for predicting the outcome of occupational accidents is relatively sparse. This study utilized diversified ML algorithms; Multinomial Logistic Regression (MLR), Support Vector Machines (SVM), Single C5.0 Tree (C5), Stochastic Gradient Boosting (SGB), and Neural Network (NN) in classifying the severity of occupational accidents in binary (Fatal/NonFatal) and multi-class (Fatal/Major/Minor) outcomes. Comparison of the performance of models showed Balanced Accuracy to be the best for SVM and SGB methods in 2-Class and SGB in 3-Class. Algorithms performed better at predicting fatal accidents compared to major and minor accidents. Results obtained revealed that, ML unveils factors contributing to severity to better address the corrective actions. Furthermore, taking action related to even some of the most significant factors in complex accidents database with many attributes can prevent majority of severe accidents. Interpretation of most significant factors identified for accident prediction suggest the following corrective measures: taking fall prevention actions, prioritizing workplace inspections based on the number of employees, and supplementing safety actions according to worker’s age and experience.


2019 ◽  
Vol 15 (2) ◽  
pp. 201-214 ◽  
Author(s):  
Mahmoud Elish

Purpose Effective and efficient software security inspection is crucial as the existence of vulnerabilities represents severe risks to software users. The purpose of this paper is to empirically evaluate the potential application of Stochastic Gradient Boosting Trees (SGBT) as a novel model for enhanced prediction of vulnerable Web components compared to common, popular and recent machine learning models. Design/methodology/approach An empirical study was conducted where the SGBT and 16 other prediction models have been trained, optimized and cross validated using vulnerability data sets from multiple versions of two open-source Web applications written in PHP. The prediction performance of these models have been evaluated and compared based on accuracy, precision, recall and F-measure. Findings The results indicate that the SGBT models offer improved prediction over the other 16 models and thus are more effective and reliable in predicting vulnerable Web components. Originality/value This paper proposed a novel application of SGBT for enhanced prediction of vulnerable Web components and showed its effectiveness.


Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 202
Author(s):  
Ge Gao ◽  
Hongxin Wang ◽  
Pengbin Gao

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.


2021 ◽  
pp. 289-301
Author(s):  
B. Martín ◽  
J. González–Arias ◽  
J. A. Vicente–Vírseda

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.


2020 ◽  
Vol 12 (23) ◽  
pp. 3925
Author(s):  
Ivan Pilaš ◽  
Mateo Gašparović ◽  
Alan Novkinić ◽  
Damir Klobučar

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.


Sign in / Sign up

Export Citation Format

Share Document