Comparing the Performance of a Logistic Regression and a Random Forest Model in Landslide Susceptibility Assessments. the Case of Wuyaun Area, China

Author(s):  
Haoyuan Hong ◽  
Paraskevas Tsangaratos ◽  
Ioanna Ilia ◽  
Wei Chen ◽  
Chong Xu
2021 ◽  
pp. 1-20
Author(s):  
Renata Pacheco Quevedo ◽  
Daniel Andrade Maciel ◽  
Tatiana Dias Tardelli Uehara ◽  
Matej Vojtek ◽  
Camilo Daleles Rennó ◽  
...  

2020 ◽  
Vol 35 (Supplement_3) ◽  
Author(s):  
Manuel Benítez Sánchez ◽  
Guillermo Martín ◽  
Luis Gil Sacaluga ◽  
Maria Jose Garcia Cortes ◽  
Sergio García Marcos ◽  
...  

Abstract Background and Aims Random Forest (RF) is an analytical technique of Artificial Intelligence (AI) that consists of an assembly of trees built by bootstrapping (resampling with replacement). In each node a subset of predictor variables is selected and for them the best cut point is determined. Each division of the tree is based on a random sample of the predictors. The trees are as long as possible. In the construction of each RF tree a part of the observations is not used (37% approx.). It is called an out-of-bag (OOB) sample and is used to obtain an honest estimate of the predictive capacity of the model. So it does not require validation. In each analysis, a few hundred Regression or classification trees are carried out, depending on whether the response variable is numerical or qualitative respectively. The result is an average of the repeated predictions of the model (Bagging). RF allows to calculate the importance of the predictor variables, which can be used later to be included in a multivariate regression model. Method We analyzed 14750 records between 2011 and 2014 contained in Information System of the Autonomous Transplant Coordination of Andalusia (SICATA) a system that includes clinical-epidemiological variables, about anemia, bone bone metabolism, adequacy of dialysis and vascular access. 1911 patients presented the event of interest (exitus). Three predictive and explanatory models of survival are developed: 1-RF. 2-.Multivariate Logistic Regression. 3- Multivariate Logistic Regression that includes the important variables of the previous RF model. We compare them in terms of accuracy (AUC of the ROC curve). Results AUC of the ROC curve of the multivariate model without prior RF was: 0.75 AUC of the ROC curve of the multivariate model with previous RF was: 0.81. AUC of the ROC curve of the Random Forest model: 0.98 Conclusion The Random Forest model has a 98% discrimination in the mortality of patients on Hemodialysis, far superior to the classic multivariate analyzes. The Multivariate Logistic Regression performed with the important RF variables improves the AUC of the previous model 0.81 vs. 0.75.


Author(s):  
Soo-Kyoung Lee ◽  
Juh Hyun Shin ◽  
Jinhyun Ahn ◽  
Ji Yeon Lee ◽  
Dong Eun Jang

Background: Machine learning (ML) can keep improving predictions and generating automated knowledge via data-driven predictors or decisions. Objective: The purpose of this study was to compare different ML methods including random forest, logistics regression, linear support vector machine (SVM), polynomial SVM, radial SVM, and sigmoid SVM in terms of their accuracy, sensitivity, specificity, negative predictor values, and positive predictive values by validating real datasets to predict factors for pressure ulcers (PUs). Methods: We applied representative ML algorithms (random forest, logistic regression, linear SVM, polynomial SVM, radial SVM, and sigmoid SVM) to develop a prediction model (N = 60). Results: The random forest model showed the greatest accuracy (0.814), followed by logistic regression (0.782), polynomial SVM (0.779), radial SVM (0.770), linear SVM (0.767), and sigmoid SVM (0.674). Conclusions: The random forest model showed the greatest accuracy for predicting PUs in nursing homes (NHs). Diverse factors that predict PUs in NHs including NH characteristics and residents’ characteristics were identified according to diverse ML methods. These factors should be considered to decrease PUs in NH residents.


2021 ◽  
Vol 15 (Supplement_1) ◽  
pp. S214-S214
Author(s):  
A Levartovsky ◽  
Y Barash ◽  
S Ben-Horin ◽  
B Ungar ◽  
E Klang ◽  
...  

Abstract Background Intra-abdominal abscess is an important clinical complication of Crohn’s disease (CD), which can be diagnosed using computed tomography (CT) or magnetic resonance imaging (MRI). However, a high index of clinical suspicion is needed to diagnose an abscess as abdominal imaging is not routinely used during hospital admission. This study aimed to identify clinical predictors of an intra-abdominal abscess among hospitalized patients with CD. Methods We created an electronic data repository of all patients with CD who visited the emergency department (ED) of our tertiary medical center between 2012 and 2018. Data included tabular demographic and clinical variables, as well as CT and MRI imaging outcomes. We searched the data repository for the presence of an abscess on abdominal imaging within seven days from the ED visit. Machine learning models were trained to predict the presence of an abscess. A logistic regression model was compared to a random forest model. The area under the receiver operator curve (AUC) was used as a metric. To establish statistical significance, bootstrapping of 100 experiments with random 80/20 training/testing splits was performed. We included only patients who were hospitalized due to complaints that can be attributed to CD exacerbation. Patients presenting within 30 days from an abdominal surgery were excluded. Results Overall, 1556 patients with CD visited the ED, of those 555 patients with a CD exacerbation. Of them, 339 patients were hospitalized and underwent abdominal imaging within 7 days from the ED visit. Forty-two patients (12.1%) were diagnosed with an abscess on abdominal imaging. The average length of the abscess was 32 mm (IQR 21.5, 43.5), mainly in the mesentery adjacent to the small bowel (38.1%). On multivariate analysis, high CRP values (64.97 mg/L, aOR 14.42 [95% CI 4.93–42.13]), high platelet count (322.5 K/microL, aOR 4.01 [95% CI 1.97–8.15]), leukocytosis (10.55 K/microL, aOR 3.83 [95% CI 1.71–8.56]) and higher heart rate (over 87.5 beats per minute, aOR 2.58 [95% CI 1.22–5.46]) were independently associated with an intra-abdominal abscess. Overall, random forest and logistic regression showed similar performance. The random forest model showed an AUC of 0.824±0.065 with eight features (CRP, Hemoglobin, WBC, age, current biologic medical treatment, BUN, current immunomodulatory medical treatment, gender). Conclusion In our large tertiary center cohort, the machine-learning model identified features associated with the presentation of an intra-abdominal abscess. Such a decision support tool may assist in triaging CD patients for imaging to exclude this potentially life-threatening complication.


Author(s):  
P. K. Gachoki ◽  
M. M. Muraya

Delays in flights have negative socio-economics effects on passengers, airlines and airports, resulting to huge economic loses. Therefore, their prediction is crucial during the decision-making process for all players of aviation industry for proper management. The development of accurate prediction models for flight delays depend on the complexity of air transport system and airport infrastructure, hence may be country specific. However, there exists no prediction models tailored to Kenyan aviation industry. Hence there is need to develop prediction models amenable to Kenya aviation conditions. The objective of this study was to compare the prediction power of the developed models. Secondary data from Jomo Kenya International Airport (JKIA) was used in this study. The data collected included the day of the flight (Monday to Sunday), the month (January to December), the airline, the flight class (domestic or international), season (summer or winter), capacity of the aircraft, flight ID (tail number) and whether the flight had flown at night or during the day. The analysis of the data was done using R- software. Three models, Logistic model, Support Vector Machine model and Random Forest model, were fitted. The strength and utility of the models was determined using bias-variance learning curves. The study revealed that the models predicted delays with different accuracies. The Random Forest model had a prediction accuracy of 68.99% while the Support Vector Machine model (SVM) had an accuracy of 68.62% and the Logistic Regression model had an accuracy of 66.18%. The Random Forest model outperformed the SVM and Logistic Regression with accuracies of 0.37% and 2.71% respectively. The SVM and Random Forest do not assume probability distribution of the response under investigation, probably indicating why they performed better than the logistic regression. The study recommends application of Random Forest model to predict flight delays at JKIA.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Yang Cao ◽  
Gary A. Bass ◽  
Rebecka Ahl ◽  
Arvid Pourlotfi ◽  
Håkan Geijer ◽  
...  

Abstract Background Geriatric patients frequently undergo emergency general surgery and accrue a greater risk of postoperative complications and fatal outcomes than the general population. It is highly relevant to develop the most appropriate care measures and to guide patient-centered decision-making around end-of-life care. Portsmouth - Physiological and Operative Severity Score for the enumeration of Mortality and morbidity (P-POSSUM) has been used to predict mortality in patients undergoing different types of surgery. In the present study, we aimed to evaluate the relative importance of the P-POSSUM score for predicting 90-day mortality in the elderly subjected to emergency laparotomy from statistical aspects. Methods One hundred and fifty-seven geriatric patients aged ≥65 years undergoing emergency laparotomy between January 1st, 2015 and December 31st, 2016 were included in the study. Mortality and 27 other patient characteristics were retrieved from the computerized records of Örebro University Hospital in Örebro, Sweden. Two supervised classification machine methods (logistic regression and random forest) were used to predict the 90-day mortality risk. Three scalers (Standard scaler, Robust scaler and Min-Max scaler) were used for variable engineering. The performance of the models was evaluated using accuracy, sensitivity, specificity and area under the receiver operating characteristic curve (AUC). Importance of the predictors were evaluated using permutation variable importance and Gini importance. Results The mean age of the included patients was 75.4 years (standard deviation =7.3 years) and the 90-day mortality rate was 29.3%. The most common indication for surgery was bowel obstruction occurring in 92 (58.6%) patients. Types of post-operative complications ranged between 7.0–36.9% with infection being the most common type. Both the logistic regression and random forest models showed satisfactory performance for predicting 90-day mortality risk in geriatric patients after emergency laparotomy, with AUCs of 0.88 and 0.93, respectively. Both models had an accuracy > 0.8 and a specificity ≥0.9. P-POSSUM had the greatest relative importance for predicting 90-day mortality in the logistic regression model and was the fifth important predictor in the random forest model. No notable change was found in sensitivity analysis using different variable engineering methods with P-POSSUM being among the five most accurate variables for mortality prediction. Conclusion P-POSSUM is important for predicting 90-day mortality after emergency laparotomy in geriatric patients. The logistic regression model and random forest model may have an accuracy of > 0.8 and an AUC around 0.9 for predicting 90-day mortality. Further validation of the variables’ importance and the models’ robustness is needed by use of larger dataset.


2021 ◽  
Vol 11 ◽  
Author(s):  
Minhong Wang ◽  
Zhan Feng ◽  
Lixiang Zhou ◽  
Liang Zhang ◽  
Xiaojun Hao ◽  
...  

Background: Our goal was to establish and verify a radiomics risk grading model for gastrointestinal stromal tumors (GISTs) and to identify the optimal algorithm for risk stratification.Methods: We conducted a retrospective analysis of 324 patients with GISTs, the presence of which was confirmed by surgical pathology. Patients were treated at three different hospitals. A training cohort of 180 patients was collected from the largest center, while an external validation cohort of 144 patients was collected from the other two centers. To extract radiomics features, regions of interest (ROIs) were outlined layer by layer along the edge of the tumor contour on CT images of the arterial and portal venous phases. The dimensionality of radiomic features was reduced, and the top 10 features with importance value above 5 were selected before modeling. The training cohort used three classifiers [logistic regression, support vector machine (SVM), and random forest] to establish three GIST risk stratification prediction models. The receiver operating characteristic curve (ROC) was used to compare model performance, which was validated by external data.Results: In the training cohort, the average area under the curve (AUC) was 0.84 ± 0.07 of the logistic regression, 0.88 ± 0.06 of the random forest, and 0.81 ± 0.08 of the SVM. In the external validation cohort, the AUC was 0.85 of the logistic regression, 0.90 of the random forest, and 0.80 of the SVM. The random forest model performed the best in both the training and the external validation cohorts and could be generalized.Conclusion: Based on CT radiomics, there are multiple machine-learning models that can predict the risk of GISTs. Among them, the random forest algorithm had the highest prediction efficiency and could be readily generalizable. Through external validation data, we assume that the random forest model may be used as an effective tool to guide preoperative clinical decision-making.


Sign in / Sign up

Export Citation Format

Share Document