Classification and Regression Trees

Construction Methods ◽

Log Linear ◽

Mining Model ◽

Decision Tables

It is the goal of classification and regression to build a data mining model that can be used for prediction. To construct such a model, we are given a set of training records, each having several attributes. These attributes can either be numerical (for example, age or salary) or categorical (for example, profession or gender). There is one distinguished attribute, the dependent attribute; the other attributes are called predictor attributes. If the dependent attribute is categorical, the problem is a classification problem. If the dependent attribute is numerical, the problem is a regression problem. It is the goal of classification and regression to construct a data mining model that predicts the (unknown) value for a record where the value of the dependent attribute is unknown. (We call such a record an unlabeled record.) Classification and regression have a wide range of applications, including scientific experiments, medical diagnosis, fraud detection, credit approval, and target marketing (Hand, 1997). Many classification and regression models have been proposed in the literature, among the more popular models are neural networks, genetic algorithms, Bayesian methods, linear and log-linear models and other statistical methods, decision tables, and tree-structured models, the focus of this chapter (Breiman, Friedman, Olshen, & Stone, 1984). Tree-structured models, socalled decision trees, are easy to understand, they are non-parametric and thus do not rely on assumptions about the data distribution, and they have fast construction methods even for large training datasets (Lim, Loh, & Shih, 2000). Most data mining suites include tools for classification and regression tree construction (Goebel & Gruenwald, 1999).

Prediction of Body Weight of Turkish Tazi Dogs using Data Mining Techniques: Classification and Regression Tree (CART) and Multivariate Adaptive Regression Splines (MARS)

Pakistan Journal of Zoology ◽

10.17582/journal.pjz/2018.50.2.575.583 ◽

2018 ◽

Vol 50 (2) ◽

Cited By ~ 4

Author(s):

Senol Celik ◽

Orhan Yilmaz

Keyword(s):

Data Mining ◽

Body Weight ◽

Regression Tree ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

Adaptive Regression ◽

Using Data ◽

Adaptive Regression Splines

Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Landslides ◽

10.1007/s10346-015-0614-1 ◽

2015 ◽

Vol 13 (5) ◽

pp. 839-856 ◽

Cited By ~ 242

Author(s):

Ahmed Mohamed Youssef ◽

Hamid Reza Pourghasemi ◽

Zohre Sadat Pourtaghi ◽

Mohamed M. Al-Katheeri

Keyword(s):

Saudi Arabia ◽

Random Forest ◽

Landslide Susceptibility ◽

Linear Models ◽

Regression Tree ◽

Landslide Susceptibility Mapping ◽

Boosted Regression Tree ◽

Asir Region

Determination of Seroprevalence of Contagious Caprine Pleuropneumonia and Associated Risk Factors in Goats and Sheep Using Classification and Regression Tree

Animals ◽

10.3390/ani11041165 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1165

Author(s):

Abdelfattah Selim ◽

Ameer Megahed ◽

Sahar Kandeel ◽

Abdullah D. Alanazi ◽

Hamdan I. Almohammed

Keyword(s):

Risk Factors ◽

Data Mining ◽

Logistic Regression ◽

Regression Tree ◽

Flock Size ◽

Factors Associated ◽

Contagious Caprine Pleuropneumonia ◽

Communal Feeding

Classification and Regression Tree (CART) analysis is a potentially powerful tool for identifying risk factors associated with contagious caprine pleuropneumonia (CCPP) and the important interactions between them. Our objective was therefore to determine the seroprevalence and identify the risk factors associated with CCPP using CART data mining modeling in the most densely sheep- and goat-populated governorates. A cross-sectional study was conducted on 620 animals (390 sheep, 230 goats) distributed over four governorates in the Nile Delta of Egypt in 2019. The randomly selected sheep and goats from different geographical study areas were serologically tested for CCPP, and the animals’ information was obtained from flock men and farm owners. Six variables (geographic location, species, flock size, age, gender, and communal feeding and watering) were used for risk analysis. Multiple stepwise logistic regression and CART modeling were used for data analysis. A total of 124 (20%) serum samples were serologically positive for CCPP. The highest prevalence of CCPP was between aged animals (>4 y; 48.7%) raised in a flock size ≥200 (100%) having communal feeding and watering (28.2%). Based on logistic regression modeling (area under the curve, AUC = 0.89; 95% CI 0.86 to 0.91), communal feeding and watering showed the highest prevalence odds ratios (POR) of CCPP (POR = 3.7, 95% CI 1.9 to 7.3), followed by age (POR = 2.1, 95% CI 1.6 to 2.8) and flock size (POR = 1.1, 95% CI 1.0 to 1.2). However, higher-accuracy CART modeling (AUC = 0.92, 95% CI 0.90 to 0.95) showed that a flock size >100 animals is the most important risk factor (importance score = 8.9), followed by age >4 y (5.3) followed by communal feeding and watering (3.1). Our results strongly suggest that the CCPP is most likely to be found in animals raised in a flock size >100 animals and with age >4 y having communal feeding and watering. Additionally, sheep seem to have an important role in the CCPP epidemiology. The CART data mining modeling showed better accuracy than the traditional logistic regression.

Causal Analysis and Data Mining of Well Stimulation Data Using Classification and Regression Tree with Enhancements

10.2118/166472-ms ◽

2013 ◽

Cited By ~ 4

Author(s):

Srimoyee Bhattacharya ◽

Marko Maucec ◽

Jeffrey Marc Yarus ◽

Dwight David Fulton ◽

Jon Matthew Orth ◽

...

Keyword(s):

Data Mining ◽

Regression Tree ◽

Causal Analysis ◽

Well Stimulation ◽

Multivariate Analysis and Data Mining of Well-Stimulation Data by Use of Classification-and-Regression Tree with Enhanced Interpretation and Prediction Capabilities

SPE Economics & Management ◽

10.2118/166472-pa ◽

2015 ◽

Vol 7 (02) ◽

pp. 60-71 ◽

Cited By ~ 7

Author(s):

Marko Maucec ◽

Ajay P. Singh ◽

Srimoyee Bhattacharya ◽

Jeffrey M. Yarus ◽

Dwight D. Fulton ◽

...

Keyword(s):

Data Mining ◽

Multivariate Analysis ◽

Regression Tree ◽

Well Stimulation ◽

Kinerja Algoritma Classification And Regression Tree (Cart) dalam Mengklasifikasikan Lama Masa Studi Mahasiswa yang Mengikuti Organisasi di Universitas Negeri Jakarta

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.3.2.9 ◽

2019 ◽

Vol 3 (2) ◽

pp. 139-145

Author(s):

Nurul Indah Prabawati ◽

Widodo ◽

Hamidillah Ajie

Keyword(s):

Data Mining ◽

Cross Validation ◽

Regression Tree ◽

Fold Cross Validation

Organisasi kemahasiswaan adalah fasilitas yang disediakan oleh perguruan tinggi sebagai wadah untuk mengembangkan kemampuan non akademis, minat dan bakat mahasiswa. Namun, dalam kenyataannya banyak mahasiswa yang mengikuti organisasi mengalami penurunan prestasi hingga tidak dapat lulus tepat waktu. Di Universitas Negeri Jakarta belum adanya sistem yang dapat mengklasifikasikan lama masa studi mahasiswa yang mengikuti organisasi. Sebelum membangun sistem pengambilan keputusan, diperlukan penelitian mengenai akurasi suatu algoritma agar sistem keputusan yang dibuat memiliki tingkat akurasi yang tinggi. Penelitian ini menggunakan algoritma data mining yaitu algoritma Classification and Regression Tree (CART). CART merupakan metode pohon keputusan biner. CART dikembangkan untuk melakukan analisis klasifikasi pada peubah respon baik yang nominal, ordinal, maupun kontinu. Metode klasifikasi CART terdiri dari dua metode yaitu metode pohon regresi dan pohon klasifikasi. Data mahasiswa yang mengikuti organisasi yang lulus tepat waktu dan tidak lulus tepat waktu akan diolah menggunakan algoritma CART. Setelah diklasifikasikan data tersebut akan dihitung hasil akurasinya menggunakan K-fold Cross Validation dengan nilai K = 5, k = 10, dan K = 20. Berdasarkan hasil contoh data mahasiswa yang mengikuti organisasi menunjukan bahwa hasil perhitungan akurasi algoritma CART terbaik diperoleh ketika nilai K = 20. Algoritma CART telah mampu mengklasifikasikan lama masa studi mahasiswa yang mengikuti organisasi di Universitas Negeri Jakarta. Algoritma CART menghasilkan rata-rata akurasi 80%.

Causal Analysis and Data Mining of Well Stimulation Data Using Classification and Regression Tree with Enhancements

Lecture Notes in Earth System Sciences - Mathematics of Planet Earth ◽

10.1007/978-3-642-32408-6_144 ◽

2013 ◽

pp. 665-668

Author(s):

Srimoyee Bhattacharya ◽

Marko Maučec ◽

Jeffrey Yarus ◽

Dwight Fulton ◽

Jon Orth ◽

...

Keyword(s):

Data Mining ◽

Regression Tree ◽

Causal Analysis ◽

Well Stimulation ◽

Erratum to: Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia

Landslides ◽

10.1007/s10346-015-0667-1 ◽

2015 ◽

Vol 13 (5) ◽

pp. 1315-1318 ◽

Cited By ~ 9

Author(s):

Ahmed Mohamed Youssef ◽

Hamid Reza Pourghasemi ◽

Zohre Sadat Pourtaghi ◽

Mohamed M. Al-Katheeri

Keyword(s):

Saudi Arabia ◽

Random Forest ◽

Landslide Susceptibility ◽

Linear Models ◽

Regression Tree ◽

Landslide Susceptibility Mapping ◽

Boosted Regression Tree ◽

Asir Region

Pengelompokan Dan Klasifikasi Pada Data Hepatitis Dengan Menggunakan Support Vector Machine (SVM), Classification And Regression Tree (Cart) Dan Regresi Logistik Biner

Journal of Education Research and Evaluation ◽

10.23887/jere.v1i3.12016 ◽

2017 ◽

Vol 1 (3) ◽

pp. 183

Author(s):

Gede Suwardika

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Regression Tree ◽

Data Classification ◽

Support Vector ◽

Virus Hepatitis ◽

Svm Classification ◽

Hepatitis adalah peradangan pada hati karena toxin, seperti kimia atauobat ataupun agen penyebab infeksi. Hepatitis yang berlangsung kurang dari 6 bulan disebut "hepatitis akut", hepatitis yang berlangsung lebih dari 6 bulan disebut "hepatitis kronis".Hepatitis biasanya terjadi karena virus, terutama salah satu dari kelima virus hepatitis, yaitu A, B, C, D atau E. Hepatitis juga bisa terjadi karena infeksi virus lainnya, seperti mononukleosis infeksiosa, demam kuning dan infeksi sitomegalovirus. Penyebab hepatitis non-virus yang utama adalah alkohol dan obat-obatan.Dalam penelitian ini dilakukan tes terhadap 155 pasien dengan respon meninggal atau hidup. Untuk itu penerapan Data Mining akan dilakukan pada kasus diatas, memanfaatkan salah satu teknik yaitu Data Classification, sejumlah data testing yang tersedia akan di analisis serta dibandingkan dengan data training untuk dilakukan prediksi meninggal atau hidup.Hasil ketepatan klasifikasi antara data training dengan data testing dengan analisis regresi logistik adalah 79,4% sedangkan dengan menggunakan SVM diperoleh sebesar 80%. Pengelompokan dengan menggunakan K-Means dan Kernel K-Means menghasilkan ketepatan pengelompokan yang berbeda. Ini menunjukkan bahwa data hepatitis memiliki pengelompokan yang baik. Kemudian hasil pengelompokan pada Kernel K-Means dibandingkan dengan data aktual yang diklasifikasikan dengan menggunakan regresi logistik, SVM dan CART dimana dihasilkan bahwa data hasil dari Kernel K-Means memiliki ketepatan klasifikasi yang lebih baik dibandingkan dengan hasil klasifikasi pada data aktual.

Comparing the Performance of Winsorize Tree to Other Data Mining Techniques for Cases Involving Outliers

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1036.0782s219 ◽

2019 ◽

Vol 8 (2S2) ◽

pp. 197-201

Keyword(s):

Data Mining ◽

Regression Tree ◽

The Other ◽

Misclassification Rate ◽

Accuracy Rate ◽

Data Mining Techniques ◽

Splitting Process

Winsorize tree is a modified tree that reformed from classification and regression tree (CART). It lays on the strategy of handling and accommodating the outliers simultaneously in all nodes while generating the subsequence branches of tree. Normally, due to the existence of outlier, the accuracy rate of most of the classifiers will be affected. Therefore, we propose winsorize tree which could resist to anomaly data. It protects the originality of the data while performing the splitting process. In this study, winsorize tree was compared to other classifiers. The results obtained from five real datasets indicate that the proposed winsorize tree performs as good as or even better compare to the other data mining techniques based on the misclassification rate.