Modeling Anthropogenic Fire Occurrence in the Boreal Forest of China Using Logistic Regression and Random Forests

We applied a classic logistic regression (LR) model together with a geographically weighted logistic regression (GWLR) model to determine the relationship between anthropogenic fire occurrence and potential driving factors in the Chinese boreal forest and to test whether the explanatory power of the LR model could be increased by considering geospatial information of geographical and human factors using a GWLR model. Three tests, “all variables”, “significant variables”, and “cross-validation”, were applied to compare model performance between the LR and GWLR models. Our results confirmed the importance of distance to railway, elevation, length of fire line, and vegetation cover on fire occurrence in the Chinese boreal forest. In addition, the GWLR model performs better than the LR model in terms of model prediction accuracy, model residual reduction, and spatial parameter estimation by considering geospatial information of explanatory variables. This indicates that the global LR model is incapable of identifying underlying causal factors for wildfire modeling sufficiently. The GWLR model helped identify spatial variation between driving factors and fire occurrence, which can contribute better understanding of forest fire occurrence over large geographic areas and the forest fire management practices may be improved based on it.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

Improving Logistic Regression/Credit Scorecards Using Random Forests: Applications with Credit Card and Home Equity Datasets

SSRN Electronic Journal ◽

10.2139/ssrn.1801392 ◽

2010 ◽

Cited By ~ 1

Author(s):

Dhruv Sharma

Keyword(s):

Logistic Regression ◽

Random Forests ◽

Credit Card ◽

Home Equity

Download Full-text

What drives forest fire in Fujian, China? Evidence from logistic regression and Random Forests

International Journal of Wildland Fire ◽

10.1071/wf15121 ◽

2016 ◽

Vol 25 (5) ◽

pp. 505 ◽

Cited By ~ 27

Author(s):

Futao Guo ◽

Guangyu Wang ◽

Zhangwen Su ◽

Huiling Liang ◽

Wenhui Wang ◽

...

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Regional Scale ◽

Fire Risk ◽

Driving Factors ◽

Fire Season ◽

Fire Occurrence ◽

Climate Factors ◽

Local Factors ◽

Risk Zones

We applied logistic regression and Random Forest to evaluate drivers of fire occurrence on a provincial scale. Potential driving factors were divided into two groups according to scale of influence: ‘climate factors’, which operate on a regional scale, and ‘local factors’, which includes infrastructure, vegetation, topographic and socioeconomic data. The groups of factors were analysed separately and then significant factors from both groups were analysed together. Both models identified significant driving factors, which were ranked in terms of relative importance. Results show that climate factors are the main drivers of fire occurrence in the forests of Fujian, China. Particularly, sunshine hours, relative humidity (fire seasonal and daily), precipitation (fire season) and temperature (fire seasonal and daily) were seen to play a crucial role in fire ignition. Of the local factors, elevation, distance to railway and per capita GDP were found to be most significant. Random Forest demonstrated a higher predictive ability than logistic regression across all groups of factors (climate, local, and climate and local combined). Maps of the likelihood of fire occurrence in Fujian illustrate that the high fire-risk zones are distributed across administrative divisions; consequently, fire management strategies should be devised based on fire-risk zones, rather than on separate administrative divisions.

Download Full-text

Fire Risk Assessment Models Using Statistical Machine Learning and Optimized Risk Indexing

Applied Sciences ◽

10.3390/app10124199 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4199

Author(s):

Myoung-Young Choi ◽

Sunghae Jun

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Logistic Regression ◽

Prediction Model ◽

Fire Risk ◽

Fire Occurrence ◽

Statistical Machine Learning ◽

Fire Insurance ◽

Comparison Results ◽

Fire Risk Assessment

It is very difficult for us to accurately predict occurrence of a fire. But, this is very important to protect human life and property. So, we study fire hazard prediction and evaluation methods to cope with fire risks. In this paper, we propose three models based on statistical machine learning and optimized risk indexing for fire risk assessment. We build logistic regression, deep neural networks (DNN) and fire risk indexing models, and verify performances between proposed and traditional models using real investigated data related to fire occurrence in Korea. In general, fire prediction models currently in use do not provide satisfactory levels of accuracy. The reason for this result is that the factors affecting fire occurrence are very diverse and frequency of fire occurrence is very sparse. To improve accuracy of fire occurrence, we first build logistic regression and DNN models. In addition, we construct a fire risk indexing model for a more improved model of fire prediction. To illustrate comparison results between our research models and current fire prediction model, we use real fire data investigated in Korea between 2011 to 2017. From the experimental results of this paper, we can confirm that accuracy of prediction by the proposed method is superior to the existing fire occurrence prediction model. Therefore, we expect the proposed model to contribute to evaluating the possibility of fire risk in buildings and factories in the field of fire insurance and to calculate the fire insurance premium.

Download Full-text

Fire Occurrence Probability Mapping of Northeast China With Binary Logistic Regression Model

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2012.2236680 ◽

2013 ◽

Vol 6 (1) ◽

pp. 121-127 ◽

Cited By ~ 17

Author(s):

Haijun Zhang ◽

Xiaoyong Han ◽

Sha Dai

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Northeast China ◽

Logistic Regression Model ◽

Binary Logistic Regression ◽

Fire Occurrence ◽

Occurrence Probability ◽

Binary Logistic Regression Model ◽

Probability Mapping

Download Full-text

Using Random Forests and Logistic Regression for Performance Prediction of Latin American ADRS and Banks

Journal of CENTRUM Cathedra (JCC) The Business and Economics Research Journal ◽

10.7835/jcc-berj-2009-0020 ◽

2009 ◽

Vol 2 (1) ◽

pp. 24-36

Author(s):

Germán Creamer

Keyword(s):

Logistic Regression ◽

Latin American ◽

Random Forests ◽

Performance Prediction

Download Full-text

Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods

JMIR Medical Informatics ◽

10.2196/15601 ◽

2019 ◽

Vol 7 (4) ◽

pp. e15601 ◽

Cited By ~ 1

Author(s):

Quazi Abidur Rahman ◽

Tahir Janmohamed ◽

Hance Clarke ◽

Paul Ritvo ◽

Jane Heffernan ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Random Forests ◽

Prediction Models ◽

Class Imbalance ◽

Majority Voting ◽

Selection Methods ◽

Logistic Regression Models ◽

High Volatility

Background Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue. Objective This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue. Methods A total of 132 features were extracted from the first month of app use to develop machine learning–based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.

Download Full-text

Further Improvement on Two-Way Cooperative Collaborative Filtering Approaches for the Binary Market Basket Data

Applied Sciences ◽

10.3390/app11198977 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8977

Author(s):

Wook-Yeon Hwang ◽

Jong-Seok Lee

Keyword(s):

Logistic Regression ◽

Collaborative Filtering ◽

Random Forests ◽

Pearson Correlation ◽

Main Idea ◽

Experimental Results ◽

Market Basket ◽

Regression Approach ◽

Cold Start Problem ◽

Better Than

Two-way cooperative collaborative filtering (CF) has been known to be crucial for binary market basket data. We propose an improved two-way logistic regression approach, a Pearson correlation-based score, a random forests (RF) R-square-based score, an RF Pearson correlation-based score, and a CF scheme based on the RF R-square-based score. The main idea is to utilize as much predictive information as possible within the two-way prediction in order to cope with the cold-start problem. All of the proposed methods work better than the existing two-way cooperative CF approach in terms of the experimental results.

Download Full-text