Heatwave Damage Prediction Using Random Forest Model in Korea

Climate change increases the frequency and intensity of heatwaves, causing significant human and material losses every year. Big data, whose volumes are rapidly increasing, are expected to be used for preemptive responses. However, human cognitive abilities are limited, which can lead to ineffective decision making during disaster responses when artificial intelligence-based analysis models are not employed. Existing prediction models have limitations with regard to their validation, and most models focus only on heat-associated deaths. In this study, a random forest model was developed for the weekly prediction of heat-related damages on the basis of four years (2015–2018) of statistical, meteorological, and floating population data from South Korea. The model was evaluated through comparisons with other traditional regression models in terms of mean absolute error, root mean squared error, root mean squared logarithmic error, and coefficient of determination (R2). In a comparative analysis with observed values, the proposed model showed an R2 value of 0.804. The results show that the proposed model outperforms existing models. They also show that the floating population variable collected from mobile global positioning systems contributes more to predictions than the aggregate population variable.

Download Full-text

The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms

Risk Management and Healthcare Policy ◽

10.2147/rmhp.s297838 ◽

2021 ◽

Vol Volume 14 ◽

pp. 1175-1187

Author(s):

Jie Song ◽

Yuan Gao ◽

Pengbin Yin ◽

Yi Li ◽

Yang Li ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Pressure Ulcer ◽

Prediction Models ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Forest Model

Download Full-text

Machine Learning Approach Using Routine Immediate Postoperative Laboratory Values for Predicting Postoperative Mortality

Journal of Personalized Medicine ◽

10.3390/jpm11121271 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1271

Author(s):

Jaehyeong Cho ◽

Jimyung Park ◽

Eugene Jeong ◽

Jihye Shin ◽

Sangjeong Ahn ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

External Validation ◽

Model Development ◽

Postoperative Mortality ◽

Random Forest Model ◽

Forest Model ◽

Laboratory Values ◽

Increased Risk

Background: Several prediction models have been proposed for preoperative risk stratification for mortality. However, few studies have investigated postoperative risk factors, which have a significant influence on survival after surgery. This study aimed to develop prediction models using routine immediate postoperative laboratory values for predicting postoperative mortality. Methods: Two tertiary hospital databases were used in this research: one for model development and another for external validation of the resulting models. The following algorithms were utilized for model development: LASSO logistic regression, random forest, deep neural network, and XGBoost. We built the models on the lab values from immediate postoperative blood tests and compared them with the SASA scoring system to demonstrate their efficacy. Results: There were 3817 patients who had immediate postoperative blood test values. All models trained on immediate postoperative lab values outperformed the SASA model. Furthermore, the developed random forest model had the best AUROC of 0.82 and AUPRC of 0.13, and the phosphorus level contributed the most to the random forest model. Conclusions: Machine learning models trained on routine immediate postoperative laboratory values outperformed previously published approaches in predicting 30-day postoperative mortality, indicating that they may be beneficial in identifying patients at increased risk of postoperative death.

Download Full-text

Application of Data Mining Technology in Risk Prediction of Metabolic Syndrome in Oil Workers

10.21203/rs.3.rs-31038/v1 ◽

2020 ◽

Author(s):

Jie Wang ◽

Chao Li ◽

Jing Li ◽

Sheng Qin ◽

Chunlei Liu ◽

...

Keyword(s):

Metabolic Syndrome ◽

Random Forest ◽

Risk Prediction ◽

Roc Curve ◽

Prediction Models ◽

Prediction Performance ◽

Random Forest Model ◽

Forest Model ◽

The Metabolic Syndrome ◽

Oil Workers

Abstract Background. The prevalence of metabolic syndrome continues to rise sharply worldwide, seriously threatening people's health.In this paper, three kinds of risk prediction models applicable to the metabolic syndrome of oil workers were established, and the optimal models were found through comparison. The optimal model can be used to identify people at high risk of metabolic syndrome as early as possible, to predict their risk, and to persuade them to change their adverse lifestyle so as to slow down and reduce the incidence of metabolic syndrome.Methods. A total of 1,468 workers from an oil company who participated in occupational health physical examination from April 2017 to October 2018 were included in this study. We established the Logistic regression model, the random forest model and the convolutional neural network model, and compared the prediction performance of the models according to the F1 score, sensitivity, accuracy and other indicators of the three models.Results. The results showed that the accuracy of the three models in the training set was 83.45%, 94.21% and 86.34%, the sensitivity was 78.47%, 94.62% and 81.30%, the F1 score was 0.79, 0.93 and 0.83, and the area under the ROC curve was 0.894, 0.987 and 0.935, respectively. In the test set, the accuracy was 76.72%, 80.66% and 78.69%, the sensitivity was 70.00%, 77.50% and 68.33%, the F1 score was 0.70, 0.76 and 0.71, and the area under the ROC curve was 0.797, 0.861 and 0.855, respectively.Conclusions. The study showed that the prediction performance of random forest model is better than other models, and the model has higher application value, which can better predict the risk of metabolic syndrome in oil workers, and provide corresponding theoretical basis for the health management of oil workers.

Download Full-text

A zero altered Poisson random forest model for genomic-enabled prediction

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkaa057 ◽

2020 ◽

Vol 11 (2) ◽

Author(s):

Osval Antonio Montesinos-López ◽

Abelardo Montesinos-López ◽

Brandon A Mosqueda-Gonzalez ◽

José Cricelio Montesinos-López ◽

José Crossa ◽

...

Keyword(s):

Random Forest ◽

Prediction Performance ◽

Random Forest Model ◽

Statistical Machine Learning ◽

Excess Zeros ◽

Forest Model ◽

Generalized Poisson ◽

Machine Learning Model ◽

Proposed Model ◽

Count Response

Abstract In genomic selection choosing the statistical machine learning model is of paramount importance. In this paper, we present an application of a zero altered random forest model with two versions (ZAP_RF and ZAPC_RF) to deal with excess zeros in count response variables. The proposed model was compared with the conventional random forest (RF) model and with the conventional Generalized Poisson Ridge regression (GPR) using two real datasets, and we found that, in terms of prediction performance, the proposed zero inflated random forest model outperformed the conventional RF and GPR models.

Download Full-text

Variant pathogenic prediction models VSRFM and VSRFM-s, the importance of splicing and allele frequency

10.1101/430975 ◽

2018 ◽

Author(s):

JL Cabrera-Alarcon ◽

J Garcia-Martinez

Keyword(s):

Random Forest ◽

Allele Frequency ◽

Prediction Models ◽

Specific Model ◽

Random Forest Model ◽

Independent Data ◽

Data Set ◽

New Model ◽

Forest Model

ABSTRACTCurrently, there are available several tools to predict the effect of variants, with the aim of classify variants in neutral or pathogenic. In this study, we propose a new model trained over ensemble scores with two particularities, first we consider minor frequency allele from gnomAD and second, we split variants based on their splicing for training each specific model. Variants Stacked Random Forest Model (VSRFM) was constructed for variants not involved in splicing and Variants Stacked Random Forest Model for splicing (VSRFM-s) was trained for variants affected by splicing. Comparing these scores with their constituent scores used as features, our models showed the best outcomes. These results were confirmed using an independent data set from Clinvar database, with similar results.

Download Full-text

Predicting the required pre-surgery blood volume in surgical patients based on machine learning

10.1101/19008045 ◽

2019 ◽

Author(s):

Ruilin Li ◽

Xinyin Han ◽

Liping Sun ◽

Yannan Feng ◽

Xiaolin Sun ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Blood Transfusion ◽

Blood Volume ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Random Forest Model ◽

Surgical Patients ◽

Forest Model ◽

Formidable Challenge

AbstractPrecisely predicting the required pre-surgery blood volume (PBV) in surgical patients is a formidable challenge in China. Inaccurate estimation is associate with excessive costs, postponed surgeries and adverse outcome after surgery due to in sufficient supply or inventory. This study aimed to predict required PBV based on machine learning techniques. 181,027 medical documents over 6 years were cleaned and finally obtained 92,057 blood transfusion records. The blood transfusion and surgery related factors of perioperative patients, surgeons experience volumes and the actual volumes of transfused RBCs were extracted. 6 machine learning algorithms were used to build prediction models. The surgery patients received allogenic RBCs or without transfusion, had total volume less than 10 units, or had the latest laboratory examinations of pre-surgery within 7 days were included, providing 118,823 data points. 39 predictive factors related to the RBCs transfusion were identified. Random forest model was selected to predict the required PBV of RBCs with 72.9% accuracy and strikingly improved the accuracy by 30.4% compared with surgeons experience, where 90% of data was used for training. We tested and demonstrated that both the data-driven models and the random forest model achieved higher accuracy than surgeons experience. Furthermore, we developed a computational tool, PTRBC, to precisely estimate the required PBV in surgical patients and we believe this tool will find more applications in assisting clinician decisions, not only confined to making accurate pre-surgery blood requirement predicting.

Download Full-text

Comparison of Models Used to Predict Flight Delays at Jomo Kenyatta International Airport

Asian Journal of Probability and Statistics ◽

10.9734/ajpas/2019/v3i330097 ◽

2019 ◽

pp. 1-8

Author(s):

P. K. Gachoki ◽

M. M. Muraya

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Support Vector Machine Model ◽

Prediction Models ◽

Random Forest Model ◽

Aviation Industry ◽

Support Vector ◽

Flight Delays ◽

Machine Model ◽

Forest Model

Delays in flights have negative socio-economics effects on passengers, airlines and airports, resulting to huge economic loses. Therefore, their prediction is crucial during the decision-making process for all players of aviation industry for proper management. The development of accurate prediction models for flight delays depend on the complexity of air transport system and airport infrastructure, hence may be country specific. However, there exists no prediction models tailored to Kenyan aviation industry. Hence there is need to develop prediction models amenable to Kenya aviation conditions. The objective of this study was to compare the prediction power of the developed models. Secondary data from Jomo Kenya International Airport (JKIA) was used in this study. The data collected included the day of the flight (Monday to Sunday), the month (January to December), the airline, the flight class (domestic or international), season (summer or winter), capacity of the aircraft, flight ID (tail number) and whether the flight had flown at night or during the day. The analysis of the data was done using R- software. Three models, Logistic model, Support Vector Machine model and Random Forest model, were fitted. The strength and utility of the models was determined using bias-variance learning curves. The study revealed that the models predicted delays with different accuracies. The Random Forest model had a prediction accuracy of 68.99% while the Support Vector Machine model (SVM) had an accuracy of 68.62% and the Logistic Regression model had an accuracy of 66.18%. The Random Forest model outperformed the SVM and Logistic Regression with accuracies of 0.37% and 2.71% respectively. The SVM and Random Forest do not assume probability distribution of the response under investigation, probably indicating why they performed better than the logistic regression. The study recommends application of Random Forest model to predict flight delays at JKIA.

Download Full-text

Agricultural Irrigation Area Prediction Based on Improved Random Forest Model

10.21203/rs.3.rs-156767/v1 ◽

2021 ◽

Author(s):

Guangda Gao ◽

Maofa Wang ◽

Hongliang Huang ◽

Weiyu Tang

Keyword(s):

Random Forest ◽

Prediction Models ◽

Absolute Error ◽

Mean Value ◽

Optimal Number ◽

Random Forest Model ◽

Irrigation Area ◽

Forest Model ◽

Grid Search Method ◽

The World

Abstract The food problem is a major problem of common concern in the world, and the prediction of irrigation area can promote the solution of food and agricultural problems. In this paper, the data of grain production and irrigation area in the world are analyzed. An improved Random Forest Regression model is proposed and applied to the prediction of irrigation area. Based on ordinary Random Forest and Limit Tree Regression algorithm, an improved random forest prediction model for irrigation area in China is proposed. Firstly, the arithmetic mean value (AMM) of mean square error (MSE) and mean absolute error (MAE) was used as the evaluation index of the improved impure function and irrigation area prediction effect. Then, the grid search method is used to determine the optimal number of decision trees (70 trees and 30 trees respectively) in ordinary random forest and limit tree regression, and a new improved random forest model is established. After following, the model is compared with other prediction models, and 10 fold cross validation shows the rationality of the model. Finally, the error analysis of the improved Random Forest model shows that the prediction error is small. It is expected to be applied in the annual analysis of irrigation area in China.

Download Full-text

Spatial modeling of gully head erosion on the Loess Plateau using a certainty factor and random forest model

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.147040 ◽

2021 ◽

Vol 783 ◽

pp. 147040

Author(s):

Chengcheng Jiang ◽

Wen Fan ◽

Ningyu Yu ◽

Enlong Liu

Keyword(s):

Random Forest ◽

Loess Plateau ◽

Spatial Modeling ◽

Random Forest Model ◽

Certainty Factor ◽

The Loess Plateau ◽

Forest Model ◽

Gully Head

Download Full-text

Modeling Population Spatial-Temporal Distribution Using Taxis Origin and Destination Data

Sustainability ◽

10.3390/su13073727 ◽

2021 ◽

Vol 13 (7) ◽

pp. 3727

Author(s):

Fatema Rahimi ◽

Abolghasem Sadeghi-Niaraki ◽

Mostafa Ghodousi ◽

Soo-Mi Choi

Keyword(s):

Regression Analysis ◽

Mean Squared Error ◽

Population Distribution ◽

Temporal Distribution ◽

Coefficient Of Determination ◽

Temporal Modeling ◽

Location Data ◽

Time Period ◽

Proposed Model

During dangerous circumstances, knowledge about population distribution is essential for urban infrastructure architecture, policy-making, and urban planning with the best Spatial-temporal resolution. The spatial-temporal modeling of the population distribution of the case study was investigated in the present study. In this regard, the number of generated trips and absorbed trips using the taxis pick-up and drop-off location data was calculated first, and the census population was then allocated to each neighborhood. Finally, the Spatial-temporal distribution of the population was calculated using the developed model. In order to evaluate the model, a regression analysis between the census population and the predicted population for the time period between 21:00 to 23:00 was used. Based on the calculation of the number of generated and the absorbed trips, it showed a different spatial distribution for different hours in one day. The spatial pattern of the population distribution during the day was different from the population distribution during the night. The coefficient of determination of the regression analysis for the model (R2) was 0.9998, and the mean squared error was 10.78. The regression analysis showed that the model works well for the nighttime population at the neighborhood level, so the proposed model will be suitable for the day time population.

Download Full-text