Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

Download Full-text

Machine Learning to Forecast Medical Attentions of Pneumonia Cases in Colombian Cities: An implementation with Air Quality, Meteorological and Admission Data

10.21203/rs.3.rs-53367/v1 ◽

2020 ◽

Author(s):

Juan David Gutiérrez

Keyword(s):

Public Health ◽

Machine Learning ◽

Air Pollution ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Health Authorities ◽

Admission Data ◽

Extreme Gradient Boosting ◽

Public Health Authorities

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.

Download Full-text

Predicting algal biochar yield using eXtreme Gradient Boosting (XGB) algorithm of machine learning methods

Algal Research ◽

10.1016/j.algal.2020.102006 ◽

2020 ◽

Vol 50 ◽

pp. 102006 ◽

Cited By ~ 3

Author(s):

Abhijeet Pathy ◽

Saswat Meher ◽

Balasubramanian P

Keyword(s):

Machine Learning ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Algal Biochar

Download Full-text

Using machine learning methods to determine Myers-Briggs Type Index (Mbti) types of people

Bulletin of the National Engineering Academy of the Republic of Kazakhstan ◽

10.47533/2020.1606-146x.58 ◽

2021 ◽

Vol 1 (79) ◽

pp. 32-39

Author(s):

A. Myngzhassar ◽

◽

A. B. Kuldzhabekov ◽

S. Daribayev ◽

А. N. Temirbekov ◽

...

Keyword(s):

Machine Learning ◽

Social Networks ◽

Text Messages ◽

Gradient Boosting ◽

Learning Methods ◽

Psychological Types ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Computer Linguistics ◽

Myers Briggs

The article is based on the problems of machine learning in the field of computer linguistics, in particular, the identification of psychological types of people on the basis of text messages on social networks. The purpose of this article is to study the methods of machine learning Naive bayes and Extreme Gradient Boosting (XGBoost) to create a classifier for the Kazakh language, which determines the type of Myers-Briggs Type Index (MBTI) based on text samples of people’s posts on social networks. The course of research experiments in the use of machine learning methods and the results of the study are presented and the results obtained are compared.

Download Full-text

Debris Flow Susceptibility Mapping Using Machine-Learning Techniques in Shigatse Area, China

Remote Sensing ◽

10.3390/rs11232801 ◽

2019 ◽

Vol 11 (23) ◽

pp. 2801 ◽

Cited By ~ 11

Author(s):

Yonghong Zhang ◽

Taotao Ge ◽

Wei Tian ◽

Yuei-An Liou

Keyword(s):

Neural Network ◽

Machine Learning ◽

Debris Flow ◽

Debris Flows ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Triggering Factors ◽

Extreme Gradient Boosting ◽

Debris Flow Susceptibility

Debris flows have been always a serious problem in the mountain areas. Research on the assessment of debris flows susceptibility (DFS) is useful for preventing and mitigating debris flow risks. The main purpose of this work is to study the DFS in the Shigatse area of Tibet, by using machine learning methods, after assessing the main triggering factors of debris flows. Remote sensing and geographic information system (GIS) are used to obtain datasets of topography, vegetation, human activities and soil factors for local debris flows. The problem of debris flow susceptibility level imbalances in datasets is addressed by the Borderline-SMOTE method. Five machine learning methods, i.e., back propagation neural network (BPNN), one-dimensional convolutional neural network (1D-CNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) have been used to analyze and fit the relationship between debris flow triggering factors and occurrence, and to evaluate the weight of each triggering factor. The ANOVA and Tukey HSD tests have revealed that the XGBoost model exhibited the best mean accuracy (0.924) on ten-fold cross-validation and the performance was significantly better than that of the BPNN (0.871), DT (0.816), and RF (0.901). However, the performance of the XGBoost did not significantly differ from that of the 1D-CNN (0.914). This is also the first comparison experiment between XGBoost and 1D-CNN methods in the DFS study. The DFS maps have been verified by five evaluation methods: Precision, Recall, F1 score, Accuracy and area under the curve (AUC). Experiments show that the XGBoost has the best score, and the factors that have a greater impact on debris flows are aspect, annual average rainfall, profile curvature, and elevation.

Download Full-text

Comparative Assessment of Machine Learning Methods for Urban Vegetation Mapping Using Multitemporal Sentinel-1 Imagery

Remote Sensing ◽

10.3390/rs12121952 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1952 ◽

Cited By ~ 5

Author(s):

Mateo Gašparović ◽

Dino Dobrinić

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Vegetation Mapping ◽

Gradient Boosting ◽

Support Vector ◽

Urban Vegetation ◽

Learning Methods ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Sar Data

Mapping of green vegetation in urban areas using remote sensing techniques can be used as a tool for integrated spatial planning to deal with urban challenges. In this context, multitemporal (MT) synthetic aperture radar (SAR) data have not been equally investigated, as compared to optical satellite data. This research compared various machine learning methods using single-date and MT Sentinel-1 (S1) imagery. The research was focused on vegetation mapping in urban areas across Europe. Urban vegetation was classified using six classifiers—random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme learning machine (ELM). Whereas, SVM showed the best performance in the single-date image analysis, the MLP classifier yielded the highest overall accuracy in the MT classification scenario. Mean overall accuracy (OA) values for all machine learning methods increased from 57% to 77% with speckle filtering. Using MT SAR data, i.e., three and five S1 imagery, an additional increase in the OA of 8.59% and 13.66% occurred, respectively. Additionally, using three and five S1 imagery for classification, the F1 measure for forest and low vegetation land-cover class exceeded 90%. This research allowed us to confirm the possibility of MT C-band SAR imagery for urban vegetation mapping.

Download Full-text

Comprehensive nutrient analysis in agricultural organic amendments through non-destructive assays using machine learning

PLoS ONE ◽

10.1371/journal.pone.0242821 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0242821

Author(s):

Erick K. Towett ◽

Lee B. Drake ◽

Gifty E. Acquah ◽

Stephan M. Haefele ◽

Steve P. McGrath ◽

...

Keyword(s):

Machine Learning ◽

Organic Amendments ◽

Cost Effective ◽

Total Carbon ◽

Gradient Boosting ◽

Learning Methods ◽

Nutrient Analysis ◽

Machine Learning Methods ◽

Mir Spectroscopy ◽

Extreme Gradient Boosting

Portable X-ray fluorescence (pXRF) and Diffuse Reflectance Fourier Transformed Mid-Infrared (DRIFT-MIR) spectroscopy are rapid and cost-effective analytical tools for material characterization. Here, we provide an assessment of these methods for the analysis of total Carbon, Nitrogen and total elemental composition of multiple elements in organic amendments. We developed machine learning methods to rapidly quantify the concentrations of macro- and micronutrient elements present in the samples and propose a novel system for the quality assessment of organic amendments. Two types of machine learning methods, forest regression and extreme gradient boosting, were used with data from both pXRF and DRIFT-MIR spectroscopy. Cross-validation trials were run to evaluate generalizability of models produced on each instrument. Both methods demonstrated similar broad capabilities in estimating nutrients using machine learning, with pXRF being suitable for nutrients and contaminants. The results make portable spectrometry in combination with machine learning a scalable solution to provide comprehensive nutrient analysis for organic amendments.

Download Full-text

Prediction of Hanwoo Cattle Phenotypes from Genotypes Using Machine Learning Methods

Animals ◽

10.3390/ani11072066 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2066

Author(s):

Swati Srivastava ◽

Bryan Irvine Lopez ◽

Himansu Kumar ◽

Myoungjin Jang ◽

Han-Ha Chai ◽

...

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Methods ◽

Eye Muscle ◽

Important Species ◽

Machine Learning Methods ◽

Extreme Gradient Boosting ◽

Boosting Method ◽

Predictive Correlation ◽

Hanwoo Cattle

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Short- and Medium-range Prediction of Relativistic Electron Flux in the Earth’s Outer Radiation Belt by Machine Learning Methods

Meteorologiya i Gidrologiya ◽

10.52002/0130-2906-2021-3-47-57 ◽

2021 ◽

Vol 3 ◽

pp. 47-57

Author(s):

I. N. Myagkova ◽

◽

V. R. Shirokii ◽

Yu. S. Shugai ◽

O. G. Barinov ◽

...

Keyword(s):

Machine Learning ◽

Radiation Belt ◽

Gradient Boosting ◽

Relativistic Electrons ◽

Learning Methods ◽

Outer Radiation Belt ◽

Machine Learning Methods ◽

The Earth ◽

Skill Scores ◽

Medium Range

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.

Download Full-text

Predicting the Tool Wear of a Drilling Process Using Novel Machine Learning XGBoost-SDA

Materials ◽

10.3390/ma13214952 ◽

2020 ◽

Vol 13 (21) ◽

pp. 4952

Author(s):

Mahdi S. Alajmi ◽

Abdullah M. Almeshal

Keyword(s):

Machine Learning ◽

Cast Iron ◽

Tool Wear ◽

Flank Wear ◽

Accurate Prediction ◽

Superior Performance ◽

Gradient Boosting ◽

Support Vector ◽

Drilling Process ◽

Extreme Gradient Boosting

Tool wear negatively impacts the quality of workpieces produced by the drilling process. Accurate prediction of tool wear enables the operator to maintain the machine at the required level of performance. This research presents a novel hybrid machine learning approach for predicting the tool wear in a drilling process. The proposed approach is based on optimizing the extreme gradient boosting algorithm’s hyperparameters by a spiral dynamic optimization algorithm (XGBoost-SDA). Simulations were carried out on copper and cast-iron datasets with a high degree of accuracy. Further comparative analyses were performed with support vector machines (SVM) and multilayer perceptron artificial neural networks (MLP-ANN), where XGBoost-SDA showed superior performance with regard to the method. Simulations revealed that XGBoost-SDA results in the accurate prediction of flank wear in the drilling process with mean absolute error (MAE) = 4.67%, MAE = 5.32%, and coefficient of determination R2 = 0.9973 for the copper workpiece. Similarly, for the cast iron workpiece, XGBoost-SDA resulted in surface roughness predictions with MAE = 5.25%, root mean square error (RMSE) = 6.49%, and R2 = 0.975, which closely agree with the measured values. Performance comparisons between SVM, MLP-ANN, and XGBoost-SDA show that XGBoost-SDA is an effective method that can ensure high predictive accuracy about flank wear values in a drilling process.

Download Full-text