Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models

Geomorphology ◽  
2016 ◽  
Vol 259 ◽  
pp. 105-118 ◽  
Author(s):  
Haoyuan Hong ◽  
Hamid Reza Pourghasemi ◽  
Zohre Sadat Pourtaghi
2021 ◽  
Vol 13 (24) ◽  
pp. 5068
Author(s):  
Shuhao Liu ◽  
Kunlong Yin ◽  
Chao Zhou ◽  
Lei Gui ◽  
Xin Liang ◽  
...  

The power network has a long transmission span and passes through wide areas with complex topography setting and various human engineering activities. They lead to frequent landslide hazards, which cause serious threats to the safe operation of the power transmission system. Thus, it is of great significance to carry out landslide susceptibility assessment for disaster prevention and mitigation of power network. We, therefore, undertake an extensive analysis and comparison study between different data-driven methods using a case study from China. Several susceptibility mapping results were generated by applying a multivariate statistical method (logistic regression (LR)) and a machine learning technique (random forest (RF)) separately with two different mapping-units and predictor sets of differing configurations. The models’ accuracies, advantages and limitations are summarized and discussed using a range of evaluation criteria, including the confusion matrix, statistical indexes, and the estimation of the area under the receiver operating characteristic curve (AUROC). The outcome showed that machine learning method is well suitable for the landslide susceptibility assessment along transmission network over grid cell units, and the accuracy of susceptibility models is evolving rapidly from statistical-based models toward machine learning techniques. However, the multivariate statistical logistic regression methods perform better when computed over heterogeneous slope terrain units, probably because the number of units is significantly reduced. Besides, the high model predictive performances cannot guarantee a high plausibility and applicability of subsequent landslide susceptibility maps. The selection of mapping unit can produce greater differences on the generated susceptibility maps than that resulting from the selection of modeling methods. The study also provided a practical example for landslide susceptibility assessment along the power transmission network and its potential application in hazard early warning, prevention, and mitigation.


2017 ◽  
Vol 49 (5) ◽  
pp. 1363-1378 ◽  
Author(s):  
Chengguang Lai ◽  
Xiaohong Chen ◽  
Zhaoli Wang ◽  
Chong-Yu Xu ◽  
Bing Yang

Abstract Rainfall-induced landslide susceptibility assessment is currently considered an effective tool for landslide hazard assessment as well as for appropriate warning and forecasting. As part of the assessment procedure, a credible index weight matrix can strongly increase the rationality of the assessment result. This study proposed a novel weight-determining method by using random forests (RFs) to find a suitable weight. Random forest weights (RFWs) and eight indexes were used to construct an assessment model of the Dongjiang River basin based on fuzzy comprehensive evaluation. The results show that RF identified the elevation (EL) and slope angle (SL) as the two most important indexes, and soil erodibility factor (SEF) and shear resistance capacity (SRC) as the two least important indexes. The assessment accuracy of RFW can be as high as 79.71%, which is higher than the entropy weight (EW) of 63.77%. Two experiments were conducted by respectively removing the most dominant and the weakest indexes to examine the rationality and feasibility of RFW; both precision validation and contrastive analysis indicated the assessment results of RFW to be reasonable and satisfactory. The initial application of RF for weight determination shows significant potential and the use of RFW is therefore recommended.


Author(s):  
Winner Walecha and Dr. Bhoomi Gupta

This paper presents a salary prediction system using the job listings from an employment website, in this case Glassdoor.com. A data mining technique is used to generate a model which will scrape number of jobs from the employment website, clean it on the basis of number of factors including the rival companies, revenue and skill required thereby predicting the salary to be expected when applying for a data science job. Techniques like linear regression, lasso regression, random forest regressors are optimised using GridsearchCV to reach the best model. The model can be further extended to build a flask API thus can be deployed on the internet for public usage.


Diabetes is a condition that happens when the blood glucose is too high, also known as blood sugar. The primary source of energy is blood sugar, and it comes from the food you eat. Insulin, a pancreatic hormone, helps food glucose get into the cells for energy use. It also leads for an unrelated condition named, "Diabetes Insipidus”, which entails complications with the processing of fluids in the kidney. Insulin is the key to the ability of the cell to use glucose. Problems with the processing of insulin or how cells perceive insulin can easily cause out of control the body's carefully balanced glucose metabolism process [1]. Diabetes emerges when either of these conditions happens, blood sugar levels rise and crash and the risk of organ damage. Earlier prediction of this diabetes condition could provide proper treatment to protect the people from un avoided illness. For this prediction we can apply data mining which is used predominantly in healthcare organizations for decision making, disease detection purpose. In this paper data have been collected from UCI repositories and the data mining tool (WEKA) is used to predict diabetes. In this database there are 768 instances in which 500 instances belongs to tested negative and 268 instances belongs to tested positive. An experimental study is carried out using data mining technique classification technique called Random Forest Tree (RFT) classifier to predict diabetes. In this research, we have used different cross fold validation to achieve better accuracy and we found that cross fold validation k= 8 gives high accuracy 76.69% while compared with other cross fold validation values.


Sign in / Sign up

Export Citation Format

Share Document