Novel ensembles of COPRAS multi-criteria decision-making with logistic regression, boosted regression tree, and random forest for spatial prediction of gully erosion susceptibility

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.

Download Full-text

Spatial prediction of susceptibility to gully erosion in Jainti River basin, Eastern India: a comparison of information value and logistic regression models

Modeling Earth Systems and Environment ◽

10.1007/s40808-018-0560-8 ◽

2018 ◽

Vol 5 (2) ◽

pp. 689-708 ◽

Cited By ~ 6

Author(s):

Tusar kanti Hembram ◽

Gopal Chandra Paul ◽

Sunil Saha

Keyword(s):

Logistic Regression ◽

River Basin ◽

Regression Models ◽

Spatial Prediction ◽

Gully Erosion ◽

Information Value ◽

Eastern India ◽

Logistic Regression Models

Download Full-text

Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Landslides ◽

10.1007/s10346-015-0614-1 ◽

2015 ◽

Vol 13 (5) ◽

pp. 839-856 ◽

Cited By ~ 242

Author(s):

Ahmed Mohamed Youssef ◽

Hamid Reza Pourghasemi ◽

Zohre Sadat Pourtaghi ◽

Mohamed M. Al-Katheeri

Keyword(s):

Saudi Arabia ◽

Random Forest ◽

Landslide Susceptibility ◽

Linear Models ◽

Regression Tree ◽

Classification And Regression Tree ◽

Landslide Susceptibility Mapping ◽

Boosted Regression Tree ◽

Classification And Regression ◽

Asir Region

Download Full-text

A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods

Environmental Earth Sciences ◽

10.1007/s12665-020-08953-0 ◽

2020 ◽

Vol 79 (10) ◽

Author(s):

Hamid Ebrahimy ◽

Bakhtiar Feizizadeh ◽

Saeed Salmani ◽

Hossein Azadi

Keyword(s):

Random Forest ◽

Comparative Study ◽

Land Subsidence ◽

Regression Tree ◽

Susceptibility Mapping ◽

Classification And Regression Tree ◽

Boosted Regression Tree ◽

Tree Methods ◽

Classification And Regression

Download Full-text

Novel Ensemble of Multivariate Adaptive Regression Spline with Spatial Logistic Regression and Boosted Regression Tree for Gully Erosion Susceptibility

Remote Sensing ◽

10.3390/rs12203284 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3284

Author(s):

Paramita Roy ◽

Subodh Chandra Pal ◽

Alireza Arabameri ◽

Rabin Chakrabortty ◽

Biswajeet Pradhan ◽

...

Keyword(s):

Logistic Regression ◽

River Basin ◽

Cross Validation ◽

Gully Erosion ◽

Multivariate Adaptive Regression Spline ◽

Boosted Regression Tree ◽

Regression Spline ◽

Adaptive Regression ◽

Fold Cross Validation ◽

Very High

The extreme form of land degradation through different forms of erosion is one of the major problems in sub-tropical monsoon dominated region. The formation and development of gullies is the dominant form or active process of erosion in this region. So, identification of erosion prone regions is necessary for escaping this type of situation and maintaining the correspondence between different spheres of the environment. The major goal of this study is to evaluate the gully erosion susceptibility in the rugged topography of the Hinglo River Basin of eastern India, which ultimately contributes to sustainable land management practices. Due to the nature of data instability, the weakness of the classifier andthe ability to handle data, the accuracy of a single method is not very high. Thus, in this study, a novel resampling algorithm was considered to increase the robustness of the classifier and its accuracy. Gully erosion susceptibility maps have been prepared using boosted regression trees (BRT), multivariate adaptive regression spline (MARS) and spatial logistic regression (SLR) with proposed resampling techniques. The re-sampling algorithm was able to increase the efficiency of all predicted models by improving the nature of the classifier. Each variable in the gully inventory map was randomly allocated with 5-fold cross validation, 10-fold cross validation, bootstrap and optimism bootstrap, while each consisted of 30% of the database. The ensemble model was tested using 70% and validated with the other 30% using the K-fold cross validation (CV) method to evaluate the influence of the random selection of training and validation database. Here, all resampling methods are associated with higher accuracy, but SLR bootstrap optimism is more optimal than any other methods according to its robust nature. The AUC values of BRT optimism bootstrap, MARS optimism bootstrap and SLR optimism bootstrap are 87.40%, 90.40% and 90.60%, respectively. According to the SLR optimism bootstrap, the 107,771 km2 (27.51%) area of this region is associated with a very high to high susceptible to gully erosion. This potential developmental area of the gully was found primarily in the Hinglo River Basin, where lateral exposure was mainly observed with scarce vegetation. The outcome of this work can help policy-makers to implement remedial measures to minimize the damage caused by erosion of the gully.

Download Full-text

Implementation of Artificial Intelligence Based Ensemble Models for Gully Erosion Susceptibility Assessment

Remote Sensing ◽

10.3390/rs12213620 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3620

Author(s):

Indrajit Chowdhuri ◽

Subodh Chandra Pal ◽

Alireza Arabameri ◽

Asish Saha ◽

Rabin Chakrabortty ◽

...

Keyword(s):

Land Use Changes ◽

Area Under The Curve ◽

Regression Tree ◽

Gully Erosion ◽

Support Vector ◽

Boosted Regression Tree ◽

Conditioning Factors ◽

Susceptibility Maps ◽

Chotanagpur Plateau ◽

Additive Regression

The Rarh Bengal region in West Bengal, particularly the eastern fringe area of the Chotanagpur plateau, is highly prone to water-induced gully erosion. In this study, we analyzed the spatial patterns of a potential gully erosion in the Gandheswari watershed. This area is highly affected by monsoon rainfall and ongoing land-use changes. This combination causes intensive gully erosion and land degradation. Therefore, we developed gully erosion susceptibility maps (GESMs) using the machine learning (ML) algorithms boosted regression tree (BRT), Bayesian additive regression tree (BART), support vector regression (SVR), and the ensemble of the SVR-Bee algorithm. The gully erosion inventory maps are based on a total of 178 gully head-cutting points, taken as the dependent factor, and gully erosion conditioning factors, which serve as the independent factors. We validated the ML model results using the area under the curve (AUC), accuracy (ACC), true skill statistic (TSS), and Kappa coefficient index. The AUC result of the BRT, BART, SVR, and SVR-Bee models are 0.895, 0.902, 0.927, and 0.960, respectively, which show very good GESM accuracies. The ensemble model provides more accurate prediction results than any single ML model used in this study.

Download Full-text

A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility

CATENA ◽

10.1016/j.catena.2016.11.032 ◽

2017 ◽

Vol 151 ◽

pp. 147-160 ◽

Cited By ~ 255

Author(s):

Wei Chen ◽

Xiaoshen Xie ◽

Jiale Wang ◽

Biswajeet Pradhan ◽

Haoyuan Hong ◽

...

Keyword(s):

Random Forest ◽

Comparative Study ◽

Landslide Susceptibility ◽

Logistic Model ◽

Regression Tree ◽

Spatial Prediction ◽

Classification And Regression Tree ◽

Tree Models ◽

Logistic Model Tree ◽

Classification And Regression

Download Full-text

GIS-based comparative assessment of flood susceptibility mapping using hybrid multi-criteria decision-making approach, naïve Bayes tree, bivariate statistics and logistic regression: A case of Topľa basin, Slovakia

Ecological Indicators ◽

10.1016/j.ecolind.2020.106620 ◽

2020 ◽

Vol 117 ◽

pp. 106620 ◽

Cited By ~ 8

Author(s):

Sk Ajim Ali ◽

Farhana Parvin ◽

Quoc Bao Pham ◽

Matej Vojtek ◽

Jana Vojteková ◽

...

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Naive Bayes ◽

Naïve Bayes ◽

Comparative Assessment ◽

Susceptibility Mapping ◽

Multi Criteria Decision Making ◽

Bivariate Statistics ◽

Flood Susceptibility ◽

Flood Susceptibility Mapping

Download Full-text

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

PeerJ ◽

10.7717/peerj.2849 ◽

2017 ◽

Vol 5 ◽

pp. e2849 ◽

Cited By ~ 44

Author(s):

Chunrong Mi ◽

Falk Huettmann ◽

Yumin Guo ◽

Xuesong Han ◽

Lijia Wen

Keyword(s):

Random Forest ◽

Species Distribution ◽

Performance Metrics ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Supporting Evidence ◽

Boosted Regression Tree ◽

Stochastic Gradient Boosting

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

Download Full-text

Why to choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

10.7287/peerj.preprints.2517 ◽

2016 ◽

Author(s):

Chunrong Mi ◽

Falk Huettmann ◽

Yumin Guo ◽

Xuesong Han ◽

Lijia Wen

Keyword(s):

Random Forest ◽

Species Distribution ◽

Performance Metrics ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Supporting Evidence ◽

Boosted Regression Tree ◽

Stochastic Gradient Boosting

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.

Download Full-text