Assessing the geographic specificity of pH prediction by classification and regression trees

Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York’s Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH.

Download Full-text

Classification and Regression Trees, Random Forest Algorithm

Machine Learning Approaches to Bioinformatics - Science, Engineering, and Biology Informatics ◽

10.1142/9789814287319_0009 ◽

2010 ◽

pp. 120-132

Keyword(s):

Random Forest ◽

Regression Trees ◽

Classification And Regression Trees ◽

Random Forest Algorithm ◽

Classification And Regression

Download Full-text

A COMPARISON OF DATA MINING TECHNIQUES FOR CREDIT SCORING IN BANKING: A MANAGERIAL PERSPECTIVE

Journal of Business Economics and Management ◽

10.3846/1611-1699.2009.10.233-240 ◽

2009 ◽

Vol 10 (3) ◽

pp. 233-240 ◽

Cited By ~ 31

Author(s):

Huseyin Ince ◽

Bora Aktan

Keyword(s):

Neural Networks ◽

Predictive Accuracy ◽

Credit Scoring ◽

Experimental Studies ◽

Regression Trees ◽

Scoring Systems ◽

Classification And Regression Trees ◽

Insurance Companies ◽

Real World Data ◽

Classification And Regression

Credit scoring is a very important task for lenders to evaluate the loan applications they receive from consumers as well as for insurance companies, which use scoring systems today to evaluate new policyholders and the risks these prospective customers might present to the insurer. Credit scoring systems are used to model the potential risk of loan applications, which have the advantage of being able to handle a large volume of credit applications quickly with minimal labour, thus reducing operating costs, and they may be an effective substitute for the use of judgment among inexperienced loan officers, thus helping to control bad debt losses. This study explores the performance of credit scoring models using traditional and artificial intelligence approaches: discriminant analysis, logistic regression, neural networks and classification and regression trees. Experimental studies using real world data sets have demonstrated that the classification and regression trees and neural networks outperform the traditional credit scoring models in terms of predictive accuracy and type II errors.

Download Full-text

Predictive Accuracy of Classification and Regression Trees (CART) Versus Conjoint Analysis

Proceedings of the 1990 Academy of Marketing Science (AMS) Annual Conference - Developments in Marketing Science: Proceedings of the Academy of Marketing Science ◽

10.1007/978-3-319-13254-9_72 ◽

2014 ◽

pp. 366-370 ◽

Cited By ~ 2

Author(s):

Paul E. Green ◽

Jinho Kim ◽

Bruce Shandler

Keyword(s):

Conjoint Analysis ◽

Predictive Accuracy ◽

Regression Trees ◽

Classification And Regression Trees ◽

Classification And Regression

Download Full-text

Contrasting regional and national mechanisms for predicting elevated arsenic in private wells across the United States using classification and regression trees

Water Research ◽

10.1016/j.watres.2016.01.023 ◽

2016 ◽

Vol 91 ◽

pp. 295-304 ◽

Cited By ~ 7

Author(s):

Logan Frederick ◽

James VanDerslice ◽

Marissa Taddie ◽

Kristen Malecki ◽

Josh Gregg ◽

...

Keyword(s):

United States ◽

Regression Trees ◽

The United States ◽

Classification And Regression Trees ◽

Classification And Regression ◽

Private Wells

Download Full-text

Proposed Clinical Indicators for Efficient Screening and Testing for COVID-19 Infection from Classification and Regression Trees (CART) Analysis

10.1101/2020.05.11.20097980 ◽

2020 ◽

Cited By ~ 1

Author(s):

Richard K Zimmerman ◽

Mary Patricia Nowalk ◽

Todd Bear ◽

Rachel Taber ◽

Theresa M Sax ◽

...

Keyword(s):

Predictive Value ◽

Recursive Partitioning ◽

Regression Trees ◽

The United States ◽

Classification And Regression Trees ◽

Clinical Indicators ◽

Patients Âgés ◽

Cart Analysis ◽

Rapid Transmission ◽

Classification And Regression

Background: The introduction and rapid transmission of SARS CoV2 in the United States resulted in implementation of methods to assess, mitigate and contain the resulting COVID-19 disease based on limited knowledge. Screening for testing has been based on symptoms typically observed in inpatients, yet outpatient symptom complexes may differ. Methods: Classification and regression trees (CART) recursive partitioning created a decision tree classifying enrollees into laboratory-confirmed cases and non-cases. Demographic and symptom data from patients ages 18-87 years who were enrolled from March 29-April 26, 2020 were included. Presence or absence of SARSCoV2 was the target variable. Results: Of 736 tested, 55 were positive for SARS-CoV2. Cases significantly more often reported chills, loss of taste/smell, diarrhea, fever, nausea/vomiting and contact with a COVID-19 case, but less frequently reported shortness of breath and sore throat. A 7-terminal node tree with a sensitivity of 96% and specificity of 53%, and an AUC of 78% was developed. The positive predictive value for this tree was 14% while the negative predictive value was 99%. Almost half (44%) of the participants could be ruled out as likely non-cases without testing. Discussion: Among those referred for testing, negative responses to three questions could classify about half of tested persons with low risk for SARS-CoV2 and would save limited testing resources. These questions are: was the patient in contact with a COVID-19 case? Has the patient experienced 1) a loss of taste or smell; or 2) nausea or vomiting? The outpatient symptoms of COVID-19 appear to be broader than the well-known inpatient syndrome.

Download Full-text

Preoperative hematocrit and platelet count are associated with blood loss during spinal fusion for children with neuromuscular scoliosis

Journal of Perioperative Practice ◽

10.1177/1750458920962634 ◽

2021 ◽

pp. 175045892096263

Author(s):

Margaret O Lewen ◽

Jay Berry ◽

Connor Johnson ◽

Rachael Grace ◽

Laurie Glader ◽

...

Keyword(s):

Platelet Count ◽

Blood Loss ◽

Spinal Fusion ◽

Regression Trees ◽

Classification And Regression Trees ◽

Neuromuscular Scoliosis ◽

Independent Variables ◽

Estimated Blood Loss ◽

Laboratory Results ◽

Classification And Regression

Aim To assess the relationship of preoperative hematology laboratory results with intraoperative estimated blood loss and transfusion volumes during posterior spinal fusion for pediatric neuromuscular scoliosis. Methods Retrospective chart review of 179 children with neuromuscular scoliosis undergoing spinal fusion at a tertiary children’s hospital between 2012 and 2017. The main outcome measure was estimated blood loss. Secondary outcomes were volumes of packed red blood cells, fresh frozen plasma, and platelets transfused intraoperatively. Independent variables were preoperative blood counts, coagulation studies, and demographic and surgical characteristics. Relationships between estimated blood loss, transfusion volumes, and independent variables were assessed using bivariable analyses. Classification and Regression Trees were used to identify variables most strongly correlated with outcomes. Results In bivariable analyses, increased estimated blood loss was significantly associated with higher preoperative hematocrit and lower preoperative platelet count but not with abnormal coagulation studies. Preoperative laboratory results were not associated with intraoperative transfusion volumes. In Classification and Regression Trees analysis, binary splits associated with the largest increase in estimated blood loss were hematocrit ≥44% vs. <44% and platelets ≥308 vs. <308 × 109/L. Conclusions Preoperative blood counts may identify patients at risk of increased bleeding, though do not predict intraoperative transfusion requirements. Abnormal coagulation studies often prompted preoperative intervention but were not associated with increased intraoperative bleeding or transfusion needs.

Download Full-text

Regional Mapping of Groundwater Potential in Ar Rub Al Khali, Arabian Peninsula Using the Classification and Regression Trees Model

Remote Sensing ◽

10.3390/rs13122300 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2300

Author(s):

Samy Elmahdy ◽

Tarig Ali ◽

Mohamed Mohamed

Keyword(s):

Machine Learning ◽

Regional Scale ◽

Regression Trees ◽

Classification And Regression Trees ◽

Groundwater Potential ◽

Machine Learning Algorithms ◽

Conditioning Factors ◽

Potential Mapping ◽

Classification And Regression ◽

Groundwater Potential Mapping

Mapping of groundwater potential in remote arid and semi-arid regions underneath sand sheets over a very regional scale is a challenge and requires an accurate classifier. The Classification and Regression Trees (CART) model is a robust machine learning classifier used in groundwater potential mapping over a very regional scale. Ten essential groundwater conditioning factors (GWCFs) were constructed using remote sensing data. The spatial relationship between these conditioning factors and the observed groundwater wells locations was optimized and identified by using the chi-square method. A total of 185 groundwater well locations were randomly divided into 129 (70%) for training the model and 56 (30%) for validation. The model was applied for groundwater potential mapping by using optimal parameters values for additive trees were 186, the value for the learning rate was 0.1, and the maximum size of the tree was five. The validation result demonstrated that the area under the curve (AUC) of the CART was 0.920, which represents a predictive accuracy of 92%. The resulting map demonstrated that the depressions of Mondafan, Khujaymah and Wajid Mutaridah depression and the southern gulf salt basin (SGSB) near Saudi Arabia, Oman and the United Arab Emirates (UAE) borders reserve fresh fossil groundwater as indicated from the observed lakes and recovered paleolakes. The proposed model and the new maps are effective at enhancing the mapping of groundwater potential over a very regional scale obtained using machine learning algorithms, which are used rarely in the literature and can be applied to the Sahara and the Kalahari Desert.

Download Full-text