scholarly journals Predicting COVID-19 mortality risk in Toronto, Canada: a comparison of tree-based and regression-based machine learning methods

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Cindy Feng ◽  
George Kephart ◽  
Elizabeth Juarez-Colunga

Abstract Background Coronavirus disease (COVID-19) presents an unprecedented threat to global health worldwide. Accurately predicting the mortality risk among the infected individuals is crucial for prioritizing medical care and mitigating the healthcare system’s burden. The present study aimed to assess the predictive accuracy of machine learning methods to predict the COVID-19 mortality risk. Methods We compared the performance of classification tree, random forest (RF), extreme gradient boosting (XGBoost), logistic regression, generalized additive model (GAM) and linear discriminant analysis (LDA) to predict the mortality risk among 49,216 COVID-19 positive cases in Toronto, Canada, reported from March 1 to December 10, 2020. We used repeated split-sample validation and k-steps-ahead forecasting validation. Predictive models were estimated using training samples, and predictive accuracy of the methods for the testing samples was assessed using the area under the receiver operating characteristic curve, Brier’s score, calibration intercept and calibration slope. Results We found XGBoost is highly discriminative, with an AUC of 0.9669 and has superior performance over conventional tree-based methods, i.e., classification tree or RF methods for predicting COVID-19 mortality risk. Regression-based methods (logistic, GAM and LASSO) had comparable performance to the XGBoost with slightly lower AUCs and higher Brier’s scores. Conclusions XGBoost offers superior performance over conventional tree-based methods and minor improvement over regression-based methods for predicting COVID-19 mortality risk in the study population.

2020 ◽  
Author(s):  
Juan David Gutiérrez

Abstract Background: Previous authors have evidenced the relationship between air pollution-aerosols and meteorological variables with the occurrence of pneumonia. Forecasting the number of attentions of pneumonia cases may be useful to optimize the allocation of healthcare resources and support public health authorities to implement emergency plans to face an increase in patients. The purpose of this study is to implement four machine-learning methods to forecast the number of attentions of pneumonia cases in the five largest cities of Colombia by using air pollution-aerosols, and meteorological and admission data.Methods: The number of attentions of pneumonia cases in the five most populated Colombian cities was provided by public health authorities between January 2009 and December 2019. Air pollution-aerosols and meteorological data were obtained from remote sensors. Four machine-learning methods were implemented for each city. We selected the machine-learning methods with the best performance in each city and implemented two techniques to identify the most relevant variables in the forecasting developed by the best-performing machine-learning models. Results: According to R2 metric, random forest was the machine-learning method with the best performance for Bogotá, Medellín and Cali; whereas for Barranquilla, the best performance was obtained from the Bayesian adaptive regression trees, and for Cartagena, extreme gradient boosting had the best performance. The most important variables for the forecasting were related to the admission data.Conclusions: The results obtained from this study suggest that machine learning can be used to efficiently forecast the number of attentions of pneumonia cases, and therefore, it can be a useful decision-making tool for public health authorities.


Author(s):  
A. Myngzhassar ◽  
◽  
A. B. Kuldzhabekov ◽  
S. Daribayev ◽  
А. N. Temirbekov ◽  
...  

The article is based on the problems of machine learning in the field of computer linguistics, in particular, the identification of psychological types of people on the basis of text messages on social networks. The purpose of this article is to study the methods of machine learning Naive bayes and Extreme Gradient Boosting (XGBoost) to create a classifier for the Kazakh language, which determines the type of Myers-Briggs Type Index (MBTI) based on text samples of people’s posts on social networks. The course of research experiments in the use of machine learning methods and the results of the study are presented and the results obtained are compared.


2019 ◽  
Vol 11 (23) ◽  
pp. 2801 ◽  
Author(s):  
Yonghong Zhang ◽  
Taotao Ge ◽  
Wei Tian ◽  
Yuei-An Liou

Debris flows have been always a serious problem in the mountain areas. Research on the assessment of debris flows susceptibility (DFS) is useful for preventing and mitigating debris flow risks. The main purpose of this work is to study the DFS in the Shigatse area of Tibet, by using machine learning methods, after assessing the main triggering factors of debris flows. Remote sensing and geographic information system (GIS) are used to obtain datasets of topography, vegetation, human activities and soil factors for local debris flows. The problem of debris flow susceptibility level imbalances in datasets is addressed by the Borderline-SMOTE method. Five machine learning methods, i.e., back propagation neural network (BPNN), one-dimensional convolutional neural network (1D-CNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) have been used to analyze and fit the relationship between debris flow triggering factors and occurrence, and to evaluate the weight of each triggering factor. The ANOVA and Tukey HSD tests have revealed that the XGBoost model exhibited the best mean accuracy (0.924) on ten-fold cross-validation and the performance was significantly better than that of the BPNN (0.871), DT (0.816), and RF (0.901). However, the performance of the XGBoost did not significantly differ from that of the 1D-CNN (0.914). This is also the first comparison experiment between XGBoost and 1D-CNN methods in the DFS study. The DFS maps have been verified by five evaluation methods: Precision, Recall, F1 score, Accuracy and area under the curve (AUC). Experiments show that the XGBoost has the best score, and the factors that have a greater impact on debris flows are aspect, annual average rainfall, profile curvature, and elevation.


2020 ◽  
Vol 12 (12) ◽  
pp. 1952 ◽  
Author(s):  
Mateo Gašparović ◽  
Dino Dobrinić

Mapping of green vegetation in urban areas using remote sensing techniques can be used as a tool for integrated spatial planning to deal with urban challenges. In this context, multitemporal (MT) synthetic aperture radar (SAR) data have not been equally investigated, as compared to optical satellite data. This research compared various machine learning methods using single-date and MT Sentinel-1 (S1) imagery. The research was focused on vegetation mapping in urban areas across Europe. Urban vegetation was classified using six classifiers—random forests (RF), support vector machine (SVM), extreme gradient boosting (XGB), multi-layer perceptron (MLP), AdaBoost.M1 (AB), and extreme learning machine (ELM). Whereas, SVM showed the best performance in the single-date image analysis, the MLP classifier yielded the highest overall accuracy in the MT classification scenario. Mean overall accuracy (OA) values for all machine learning methods increased from 57% to 77% with speckle filtering. Using MT SAR data, i.e., three and five S1 imagery, an additional increase in the OA of 8.59% and 13.66% occurred, respectively. Additionally, using three and five S1 imagery for classification, the F1 measure for forest and low vegetation land-cover class exceeded 90%. This research allowed us to confirm the possibility of MT C-band SAR imagery for urban vegetation mapping.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0242821
Author(s):  
Erick K. Towett ◽  
Lee B. Drake ◽  
Gifty E. Acquah ◽  
Stephan M. Haefele ◽  
Steve P. McGrath ◽  
...  

Portable X-ray fluorescence (pXRF) and Diffuse Reflectance Fourier Transformed Mid-Infrared (DRIFT-MIR) spectroscopy are rapid and cost-effective analytical tools for material characterization. Here, we provide an assessment of these methods for the analysis of total Carbon, Nitrogen and total elemental composition of multiple elements in organic amendments. We developed machine learning methods to rapidly quantify the concentrations of macro- and micronutrient elements present in the samples and propose a novel system for the quality assessment of organic amendments. Two types of machine learning methods, forest regression and extreme gradient boosting, were used with data from both pXRF and DRIFT-MIR spectroscopy. Cross-validation trials were run to evaluate generalizability of models produced on each instrument. Both methods demonstrated similar broad capabilities in estimating nutrients using machine learning, with pXRF being suitable for nutrients and contaminants. The results make portable spectrometry in combination with machine learning a scalable solution to provide comprehensive nutrient analysis for organic amendments.


Animals ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 2066
Author(s):  
Swati Srivastava ◽  
Bryan Irvine Lopez ◽  
Himansu Kumar ◽  
Myoungjin Jang ◽  
Han-Ha Chai ◽  
...  

Hanwoo was originally raised for draft purposes, but the increase in local demand for red meat turned that purpose into full-scale meat-type cattle rearing; it is now considered one of the most economically important species and a vital food source for Koreans. The application of genomic selection in Hanwoo breeding programs in recent years was expected to lead to higher genetic progress. However, better statistical methods that can improve the genomic prediction accuracy are required. Hence, this study aimed to compare the predictive performance of three machine learning methods, namely, random forest (RF), extreme gradient boosting method (XGB), and support vector machine (SVM), when predicting the carcass weight (CWT), marbling score (MS), backfat thickness (BFT) and eye muscle area (EMA). Phenotypic and genotypic data (53,866 SNPs) from 7324 commercial Hanwoo cattle that were slaughtered at the age of around 30 months were used. The results showed that the boosting method XGB showed the highest predictive correlation for CWT and MS, followed by GBLUP, SVM, and RF. Meanwhile, the best predictive correlation for BFT and EMA was delivered by GBLUP, followed by SVM, RF, and XGB. Although XGB presented the highest predictive correlations for some traits, we did not find an advantage of XGB or any machine learning methods over GBLUP according to the mean squared error of prediction. Thus, we still recommend the use of GBLUP in the prediction of genomic breeding values for carcass traits in Hanwoo cattle.


2021 ◽  
Author(s):  
Polash Banerjee

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.


2021 ◽  
Vol 3 ◽  
pp. 47-57
Author(s):  
I. N. Myagkova ◽  
◽  
V. R. Shirokii ◽  
Yu. S. Shugai ◽  
O. G. Barinov ◽  
...  

The ways are studied to improve the quality of prediction of the time series of hourly mean fluxes and daily total fluxes (fluences) of relativistic electrons in the outer radiation belt of the Earth 1 to 24 hours ahead and 1 to 4 days ahead, respectively. The prediction uses an approximation approach based on various machine learning methods, namely, artificial neural networks (ANNs), decision tree (random forest), and gradient boosting. A comparison of the skill scores of short-range forecasts with the lead time of 1 to 24 hours showed that the best results were demonstrated by ANNs. For medium-range forecasting, the accuracy of prediction of the fluences of relativistic electrons in the Earth’s outer radiation belt three to four days ahead increases significantly when the predicted values of the solar wind velocity near the Earth obtained from the UV images of the Sun of the AIA (Atmospheric Imaging Assembly) instrument of the SDO (Solar Dynamics Observatory) are included to the list of the input parameters.


Materials ◽  
2020 ◽  
Vol 13 (21) ◽  
pp. 4952
Author(s):  
Mahdi S. Alajmi ◽  
Abdullah M. Almeshal

Tool wear negatively impacts the quality of workpieces produced by the drilling process. Accurate prediction of tool wear enables the operator to maintain the machine at the required level of performance. This research presents a novel hybrid machine learning approach for predicting the tool wear in a drilling process. The proposed approach is based on optimizing the extreme gradient boosting algorithm’s hyperparameters by a spiral dynamic optimization algorithm (XGBoost-SDA). Simulations were carried out on copper and cast-iron datasets with a high degree of accuracy. Further comparative analyses were performed with support vector machines (SVM) and multilayer perceptron artificial neural networks (MLP-ANN), where XGBoost-SDA showed superior performance with regard to the method. Simulations revealed that XGBoost-SDA results in the accurate prediction of flank wear in the drilling process with mean absolute error (MAE) = 4.67%, MAE = 5.32%, and coefficient of determination R2 = 0.9973 for the copper workpiece. Similarly, for the cast iron workpiece, XGBoost-SDA resulted in surface roughness predictions with MAE = 5.25%, root mean square error (RMSE) = 6.49%, and R2 = 0.975, which closely agree with the measured values. Performance comparisons between SVM, MLP-ANN, and XGBoost-SDA show that XGBoost-SDA is an effective method that can ensure high predictive accuracy about flank wear values in a drilling process.


Sign in / Sign up

Export Citation Format

Share Document