scholarly journals Groundwater Augmentation through the Site Selection of Floodwater Spreading Using a Data Mining Approach (Case study: Mashhad Plain, Iran)

Water ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 1405 ◽  
Author(s):  
Seyed Naghibi ◽  
Mehdi Vafakhah ◽  
Hossein Hashemi ◽  
Biswajeet Pradhan ◽  
Seyed Alavi

It is a well-known fact that sustainable development goals are difficult to achieve without a proper water resources management strategy. This study tries to implement some state-of-the-art statistical and data mining models i.e., weights-of-evidence (WoE), boosted regression trees (BRT), and classification and regression tree (CART) to identify suitable areas for artificial recharge through floodwater spreading (FWS). At first, suitable areas for the FWS project were identified in a basin in north-eastern Iran based on the national guidelines and a literature survey. Using the same methodology, an identical number of FWS unsuitable areas were also determined. Afterward, a set of different FWS conditioning factors were selected for modeling FWS suitability. The models were applied using 70% of the suitable and unsuitable locations and validated with the rest of the input data (i.e., 30%). Finally, a receiver operating characteristics (ROC) curve was plotted to compare the produced FWS suitability maps. The findings depicted acceptable performance of the BRT, CART, and WoE for FWS suitability mapping with an area under the ROC curves of 92, 87.5, and 81.6%, respectively. Among the considered variables, transmissivity, distance from rivers, aquifer thickness, and electrical conductivity were determined as the most important contributors in the modeling. FWS suitability maps produced by the proposed method in this study could be used as a guideline for water resource managers to control flood damage and obtain new sources of groundwater. This methodology could be easily replicated to produce FWS suitability maps in other regions with similar hydrogeological conditions.

2017 ◽  
Vol 63 (No. 9) ◽  
pp. 425-432 ◽  
Author(s):  
Shabani Saeid

Controlling the soil damage caused by forest harvesting has a key role in forest management due to its effect on forest dynamics and productivity, mainly through modifying the physical, mechanical, and hydrological context of soil. This study was conducted to evaluate the soil damage susceptibility in one of the Caspian forests, Iran. For this purpose, two data mining techniques including classification and regression tree (CART) and random forest (RF) were applied. A total of 224 soil damage locations were identified primarily from field surveys. Then, 10 conditioning variables were produced in GIS. For model performance, the outputs of the analyses were compared with the field-verified soil damage locations. Our results show that slope degree, soil type, and slope aspect had the highest weight on soil damage, in the order of their appurtenance. Additionally, according to the relative operating characteristics curve, RF is a more suitable prediction model for soil damage zoning compared to CART. In summary, the findings of this study suggest that soil damage susceptibility mapping is an effective technique for Caspian forests, Iran.


2021 ◽  
Vol 35 (3) ◽  
pp. 209-215
Author(s):  
Pratibha Verma ◽  
Vineet Kumar Awasthi ◽  
Sanat Kumar Sahu

Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.


2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Amélie Mugnier ◽  
Sylvie Chastant-Maillard ◽  
Hanna Mila ◽  
Faouzi Lyazrhi ◽  
Florine Guiraud ◽  
...  

Abstract Background Neonatal mortality (over the first three weeks of life) is a major concern in canine breeding facilities as an economic and welfare issue. Since low birth weight (LBW) dramatically increases the risk of neonatal death, the risk factors of occurrence need to be identified together with the chances and determinants of survival of newborns at-risk. Results Data from 4971 puppies from 10 breeds were analysed. Two birth weight thresholds regarding the risk of neonatal mortality were identified by breed, using respectively Receiver Operating Characteristics and Classification and Regression Tree method. Puppies were qualified as LBW and very low birth weight (VLBW) when their birth weight value was respectively between the two thresholds and lower than the two thresholds. Mortality rates were 4.2, 8.8 and 55.3%, in the normal, LBW and VLBW groups, accounting for 48.7, 47.9 and 3.4% of the included puppies, respectively. A separate binary logistic regression approach allowed to identify breed, gender and litter size as determinants of LBW. The increase in litter size and being a female were associated with a higher risk for LBW. Survival for LBW puppies was reduced in litters with at least one stillborn, compared to litters with no stillborn, and was also reduced when the dam was more than 6 years old. Concerning VLBW puppies, occurrence and survival were influenced by litter size. Surprisingly, the decrease in litter size was a risk factor for VLBW and also reduced their survival. The results of this study suggest that VLBW and LBW puppies are two distinct populations. Moreover, it indicates that events and factors affecting intrauterine growth (leading to birth weight reduction) also affect their ability to adapt to extrauterine life. Conclusion These findings could help veterinarians and breeders to improve the management of their facility and more specifically of LBW puppies. Possible recommendations would be to only select for reproduction dams of optimal age and to pay particular attention to LBW puppies born in small litters. Further studies are required to understand the origin of LBW in dogs.


2020 ◽  
Vol 79 (4) ◽  
pp. 445-452 ◽  
Author(s):  
Paul Studenic ◽  
David Felson ◽  
Maarten de Wit ◽  
Farideh Alasti ◽  
Tanja A Stamm ◽  
...  

ObjectivesThis study aimed to evaluate different patient global assessment (PGA) cut-offs required in the American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) Boolean remission definition for their utility in rheumatoid arthritis (RA).MethodsWe used data from six randomised controlled trials in early and established RA. We increased the threshold for the 0–10 score for PGA gradually from 1 to 3 in steps of 0.5 (Boolean1.5 to Boolean3.0) and omitted PGA completely (BooleanX) at 6 and 12 months. Agreement with the index-based (Simplified Disease Activity Index (SDAI)) remission definition was analysed using kappa, recursive partitioning (classification and regression tree (CART)) and receiver operating characteristics. The impact of achieving each definition on functional and radiographic outcomes after 1 year was explored.ResultsData from 1680 patients with early RA and 920 patients with established RA were included. The proportion of patients achieving Boolean remission increased with higher thresholds for PGA from 12.4% to 19.7% in early and 5.9% to 12.3% in established RA at 6 months. Best agreement with SDAI remission occurred at PGA cut-offs of 1.5 and 2.0, while agreement decreased with higher PGA (CART: optimal agreement at PGA≤1.6 cm; sensitivity of PGA≤1.5 95%). Changing PGA thresholds at 6 months did not affect radiographic progression at 12 months (mean ꙙsmTSS for Boolean, 1.5, 2.0, 2.5, 3.0, BooleanX: 0.35±5.4, 0.38±5.14, 0.41±5.1, 0.37±4.9, 0.34±4.9, 0.27±4.7). However, the proportion attaining HAQ≤0.5 was 90.2%, 87.9%, 85.2%, 81.1%, 80.7% and 73.1% for the respective Boolean definitions.ConclusionIncreasing the PGA cut-off to 1.5 cm would provide high consistency between Boolean with the index-based remission; the integer cut-off of 2.0 cm performed similarly.


2003 ◽  
Vol 17 (1) ◽  
pp. 109-114 ◽  
Author(s):  
S.A. Gansky

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.


Animals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 1165
Author(s):  
Abdelfattah Selim ◽  
Ameer Megahed ◽  
Sahar Kandeel ◽  
Abdullah D. Alanazi ◽  
Hamdan I. Almohammed

Classification and Regression Tree (CART) analysis is a potentially powerful tool for identifying risk factors associated with contagious caprine pleuropneumonia (CCPP) and the important interactions between them. Our objective was therefore to determine the seroprevalence and identify the risk factors associated with CCPP using CART data mining modeling in the most densely sheep- and goat-populated governorates. A cross-sectional study was conducted on 620 animals (390 sheep, 230 goats) distributed over four governorates in the Nile Delta of Egypt in 2019. The randomly selected sheep and goats from different geographical study areas were serologically tested for CCPP, and the animals’ information was obtained from flock men and farm owners. Six variables (geographic location, species, flock size, age, gender, and communal feeding and watering) were used for risk analysis. Multiple stepwise logistic regression and CART modeling were used for data analysis. A total of 124 (20%) serum samples were serologically positive for CCPP. The highest prevalence of CCPP was between aged animals (>4 y; 48.7%) raised in a flock size ≥200 (100%) having communal feeding and watering (28.2%). Based on logistic regression modeling (area under the curve, AUC = 0.89; 95% CI 0.86 to 0.91), communal feeding and watering showed the highest prevalence odds ratios (POR) of CCPP (POR = 3.7, 95% CI 1.9 to 7.3), followed by age (POR = 2.1, 95% CI 1.6 to 2.8) and flock size (POR = 1.1, 95% CI 1.0 to 1.2). However, higher-accuracy CART modeling (AUC = 0.92, 95% CI 0.90 to 0.95) showed that a flock size >100 animals is the most important risk factor (importance score = 8.9), followed by age >4 y (5.3) followed by communal feeding and watering (3.1). Our results strongly suggest that the CCPP is most likely to be found in animals raised in a flock size >100 animals and with age >4 y having communal feeding and watering. Additionally, sheep seem to have an important role in the CCPP epidemiology. The CART data mining modeling showed better accuracy than the traditional logistic regression.


2020 ◽  
Vol 19 ◽  
pp. 153303382097969
Author(s):  
Kyung Hwan Chang ◽  
Young Hyun Lee ◽  
Byung Hun Park ◽  
Min Cheol Han ◽  
Jihun Kim ◽  
...  

Purpose: This study aimed to investigate the parameters with a significant impact on delivery quality assurance (DQA) failure and analyze the planning parameters as possible predictors of DQA failure for helical tomotherapy. Methods: In total, 212 patients who passed or failed DQA measurements were retrospectively included in this study. Brain (n = 43), head and neck (n = 37), spinal (n = 12), prostate (n = 36), rectal (n = 36), pelvis (n = 13), cranial spinal irradiation and a treatment field including lymph nodes (n = 24), and other types of cancer (n = 11) were selected. The correlation between DQA results and treatment planning parameters were analyzed using logistic regression analysis. Receiver operating characteristic (ROC) curves, areas under the curves (AUCs), and the Classification and Regression Tree (CART) algorithm were used to analyze treatment planning parameters as possible predictors for DQA failure. Results: The AUC for leaf open time (LOT) was 0.70, and its cut-off point was approximately 30%. The ROC curve for the predicted probability calculated when the multivariate variable model was applied showed an AUC of 0.815. We confirmed that total monitor units, total dose, and LOT were significant predictors for DQA failure using the CART. Conclusions: The probability of DQA failure was higher when the percentage of LOT below 100 ms was higher than 30%. The percentage of LOT below 100 ms should be considered in the treatment planning process. The findings from this study may assist in the prediction of DQA failure in the future.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Nathella Pavan Kumar ◽  
Syed Hissar ◽  
Kannan Thiruvengadam ◽  
Velayuthum V. Banurekha ◽  
Sarath Balaji ◽  
...  

Abstract Background Diagnosing tuberculosis (TB) in children is challenging due to paucibacillary disease, and lack of ability for microbiologic confirmation. Hence, we measured the plasma chemokines as biomarkers for diagnosis of pediatric tuberculosis. Methods We conducted a prospective case control study using children with confirmed, unconfirmed and unlikely TB. Multiplex assay was performed to examine the plasma CC and CXC levels of chemokines. Results Baseline levels of CCL1, CCL3, CXCL1, CXCL2 and CXCL10 were significantly higher in active TB (confirmed TB and unconfirmed TB) in comparison to unlikely TB children. Receiver operating characteristics curve analysis revealed that CCL1, CXCL1 and CXCL10 could act as biomarkers distinguishing confirmed or unconfirmed TB from unlikely TB with the sensitivity and specificity of more than 80%. In addition, combiROC exhibited more than 90% sensitivity and specificity in distinguishing confirmed and unconfirmed TB from unlikely TB. Finally, classification and regression tree models also offered more than 90% sensitivity and specificity for CCL1 with a cutoff value of 28 pg/ml, which clearly classify active TB from unlikely TB. The levels of CCL1, CXCL1, CXCL2 and CXCL10 exhibited a significant reduction following anti-TB treatment. Conclusion Thus, a baseline chemokine signature of CCL1/CXCL1/CXCL10 could serve as an accurate biomarker for the diagnosis of pediatric tuberculosis.


Author(s):  
Johannes Gehrke

It is the goal of classification and regression to build a data mining model that can be used for prediction. To construct such a model, we are given a set of training records, each having several attributes. These attributes can either be numerical (for example, age or salary) or categorical (for example, profession or gender). There is one distinguished attribute, the dependent attribute; the other attributes are called predictor attributes. If the dependent attribute is categorical, the problem is a classification problem. If the dependent attribute is numerical, the problem is a regression problem. It is the goal of classification and regression to construct a data mining model that predicts the (unknown) value for a record where the value of the dependent attribute is unknown. (We call such a record an unlabeled record.) Classification and regression have a wide range of applications, including scientific experiments, medical diagnosis, fraud detection, credit approval, and target marketing (Hand, 1997). Many classification and regression models have been proposed in the literature, among the more popular models are neural networks, genetic algorithms, Bayesian methods, linear and log-linear models and other statistical methods, decision tables, and tree-structured models, the focus of this chapter (Breiman, Friedman, Olshen, & Stone, 1984). Tree-structured models, socalled decision trees, are easy to understand, they are non-parametric and thus do not rely on assumptions about the data distribution, and they have fast construction methods even for large training datasets (Lim, Loh, & Shih, 2000). Most data mining suites include tools for classification and regression tree construction (Goebel & Gruenwald, 1999).


Sign in / Sign up

Export Citation Format

Share Document