scholarly journals Modelling and mapping of soil damage caused by harvesting in Caspian forests (Iran) using CART and RF data mining techniques

2017 ◽  
Vol 63 (No. 9) ◽  
pp. 425-432 ◽  
Author(s):  
Shabani Saeid

Controlling the soil damage caused by forest harvesting has a key role in forest management due to its effect on forest dynamics and productivity, mainly through modifying the physical, mechanical, and hydrological context of soil. This study was conducted to evaluate the soil damage susceptibility in one of the Caspian forests, Iran. For this purpose, two data mining techniques including classification and regression tree (CART) and random forest (RF) were applied. A total of 224 soil damage locations were identified primarily from field surveys. Then, 10 conditioning variables were produced in GIS. For model performance, the outputs of the analyses were compared with the field-verified soil damage locations. Our results show that slope degree, soil type, and slope aspect had the highest weight on soil damage, in the order of their appurtenance. Additionally, according to the relative operating characteristics curve, RF is a more suitable prediction model for soil damage zoning compared to CART. In summary, the findings of this study suggest that soil damage susceptibility mapping is an effective technique for Caspian forests, Iran.

Water ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 1405 ◽  
Author(s):  
Seyed Naghibi ◽  
Mehdi Vafakhah ◽  
Hossein Hashemi ◽  
Biswajeet Pradhan ◽  
Seyed Alavi

It is a well-known fact that sustainable development goals are difficult to achieve without a proper water resources management strategy. This study tries to implement some state-of-the-art statistical and data mining models i.e., weights-of-evidence (WoE), boosted regression trees (BRT), and classification and regression tree (CART) to identify suitable areas for artificial recharge through floodwater spreading (FWS). At first, suitable areas for the FWS project were identified in a basin in north-eastern Iran based on the national guidelines and a literature survey. Using the same methodology, an identical number of FWS unsuitable areas were also determined. Afterward, a set of different FWS conditioning factors were selected for modeling FWS suitability. The models were applied using 70% of the suitable and unsuitable locations and validated with the rest of the input data (i.e., 30%). Finally, a receiver operating characteristics (ROC) curve was plotted to compare the produced FWS suitability maps. The findings depicted acceptable performance of the BRT, CART, and WoE for FWS suitability mapping with an area under the ROC curves of 92, 87.5, and 81.6%, respectively. Among the considered variables, transmissivity, distance from rivers, aquifer thickness, and electrical conductivity were determined as the most important contributors in the modeling. FWS suitability maps produced by the proposed method in this study could be used as a guideline for water resource managers to control flood damage and obtain new sources of groundwater. This methodology could be easily replicated to produce FWS suitability maps in other regions with similar hydrogeological conditions.


Winsorize tree is a modified tree that reformed from classification and regression tree (CART). It lays on the strategy of handling and accommodating the outliers simultaneously in all nodes while generating the subsequence branches of tree. Normally, due to the existence of outlier, the accuracy rate of most of the classifiers will be affected. Therefore, we propose winsorize tree which could resist to anomaly data. It protects the originality of the data while performing the splitting process. In this study, winsorize tree was compared to other classifiers. The results obtained from five real datasets indicate that the proposed winsorize tree performs as good as or even better compare to the other data mining techniques based on the misclassification rate.


2016 ◽  
Vol 76 (2) ◽  
pp. 341-351
Author(s):  
L. F. C. Rezende ◽  
B. C. Arenque-Musa ◽  
M. S. B. Moura ◽  
S. T. Aidar ◽  
C. Von Randow ◽  
...  

Abstract The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga.


2018 ◽  
Vol 8 (8) ◽  
pp. 1369 ◽  
Author(s):  
Alireza Arabameri ◽  
Biswajeet Pradhan ◽  
Hamid Reza Pourghasemi ◽  
Khalil Rezaei ◽  
Norman Kerle

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.


2021 ◽  
Vol 35 (3) ◽  
pp. 209-215
Author(s):  
Pratibha Verma ◽  
Vineet Kumar Awasthi ◽  
Sanat Kumar Sahu

Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.


2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Amélie Mugnier ◽  
Sylvie Chastant-Maillard ◽  
Hanna Mila ◽  
Faouzi Lyazrhi ◽  
Florine Guiraud ◽  
...  

Abstract Background Neonatal mortality (over the first three weeks of life) is a major concern in canine breeding facilities as an economic and welfare issue. Since low birth weight (LBW) dramatically increases the risk of neonatal death, the risk factors of occurrence need to be identified together with the chances and determinants of survival of newborns at-risk. Results Data from 4971 puppies from 10 breeds were analysed. Two birth weight thresholds regarding the risk of neonatal mortality were identified by breed, using respectively Receiver Operating Characteristics and Classification and Regression Tree method. Puppies were qualified as LBW and very low birth weight (VLBW) when their birth weight value was respectively between the two thresholds and lower than the two thresholds. Mortality rates were 4.2, 8.8 and 55.3%, in the normal, LBW and VLBW groups, accounting for 48.7, 47.9 and 3.4% of the included puppies, respectively. A separate binary logistic regression approach allowed to identify breed, gender and litter size as determinants of LBW. The increase in litter size and being a female were associated with a higher risk for LBW. Survival for LBW puppies was reduced in litters with at least one stillborn, compared to litters with no stillborn, and was also reduced when the dam was more than 6 years old. Concerning VLBW puppies, occurrence and survival were influenced by litter size. Surprisingly, the decrease in litter size was a risk factor for VLBW and also reduced their survival. The results of this study suggest that VLBW and LBW puppies are two distinct populations. Moreover, it indicates that events and factors affecting intrauterine growth (leading to birth weight reduction) also affect their ability to adapt to extrauterine life. Conclusion These findings could help veterinarians and breeders to improve the management of their facility and more specifically of LBW puppies. Possible recommendations would be to only select for reproduction dams of optimal age and to pay particular attention to LBW puppies born in small litters. Further studies are required to understand the origin of LBW in dogs.


2011 ◽  
Vol 62 (1) ◽  
pp. 5-16 ◽  
Author(s):  
Sebastian Vogel ◽  
Michael Märker ◽  
Florian Seiler

Revised modelling of the post-AD 79 volcanic deposits of Somma-Vesuvius to reconstruct the pre-AD 79 topography of the Sarno River plain (Italy) In this study the methodology proposed by Vogel & Märker (2010) to reconstruct the pre-AD 79 topography and paleo-environmental features of the Sarno River plain (Italy) was considerably revised and improved. The methodology is based on an extensive dataset of stratigraphical information from the entire Sarno River plain, a high-resolution present-day digital elevation model (DEM) and a classification and regression tree approach. The dataset was re-evaluated and 32 additional stratigraphical drillings were collected in areas that were not or insufficiently covered by previous stratigraphic data. Altogether, an assemblage of 1,840 drillings, containing information about the depth from the present-day surface to the pre-AD 79 paleo-surface (thickness of post-AD 79 deposits) and the character of the pre-AD 79 paleo-layer of the Sarno River plain was utilized. Moreover, an improved preprocessing of the input parameters attained a distinct progress in model performance in comparison to the previous model of Vogel & Märker (2010). Subsequently, a spatial model of the post-AD 79 deposits was generated. The modelled deposits were then used to reconstruct the pre-AD 79 topography of the Sarno River plain. Moreover, paleo-environmental and paleo-geomorphological features such as the paleo-coastline, the paleo-Sarno River and its floodplain, alluvial fans near the Tyrrhenian coast as well as abrasion terraces of historical and protohistorical coastlines were identified. This reconstruction represents a qualitative improvement of the previous work by Vogel & Märker (2010).


2021 ◽  
Vol 13 (3) ◽  
pp. 901-913
Author(s):  
S. Gupta ◽  
R. R. Sedamkar

Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).


2020 ◽  
Vol 79 (4) ◽  
pp. 445-452 ◽  
Author(s):  
Paul Studenic ◽  
David Felson ◽  
Maarten de Wit ◽  
Farideh Alasti ◽  
Tanja A Stamm ◽  
...  

ObjectivesThis study aimed to evaluate different patient global assessment (PGA) cut-offs required in the American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) Boolean remission definition for their utility in rheumatoid arthritis (RA).MethodsWe used data from six randomised controlled trials in early and established RA. We increased the threshold for the 0–10 score for PGA gradually from 1 to 3 in steps of 0.5 (Boolean1.5 to Boolean3.0) and omitted PGA completely (BooleanX) at 6 and 12 months. Agreement with the index-based (Simplified Disease Activity Index (SDAI)) remission definition was analysed using kappa, recursive partitioning (classification and regression tree (CART)) and receiver operating characteristics. The impact of achieving each definition on functional and radiographic outcomes after 1 year was explored.ResultsData from 1680 patients with early RA and 920 patients with established RA were included. The proportion of patients achieving Boolean remission increased with higher thresholds for PGA from 12.4% to 19.7% in early and 5.9% to 12.3% in established RA at 6 months. Best agreement with SDAI remission occurred at PGA cut-offs of 1.5 and 2.0, while agreement decreased with higher PGA (CART: optimal agreement at PGA≤1.6 cm; sensitivity of PGA≤1.5 95%). Changing PGA thresholds at 6 months did not affect radiographic progression at 12 months (mean ꙙsmTSS for Boolean, 1.5, 2.0, 2.5, 3.0, BooleanX: 0.35±5.4, 0.38±5.14, 0.41±5.1, 0.37±4.9, 0.34±4.9, 0.27±4.7). However, the proportion attaining HAQ≤0.5 was 90.2%, 87.9%, 85.2%, 81.1%, 80.7% and 73.1% for the respective Boolean definitions.ConclusionIncreasing the PGA cut-off to 1.5 cm would provide high consistency between Boolean with the index-based remission; the integer cut-off of 2.0 cm performed similarly.


Sign in / Sign up

Export Citation Format

Share Document