Comparison estimating of classification error rate in decision tree: Data mining
Decision Tree (DT) typically splitting criteria using one variable at a time. In this way, the final decision partition has boundaries that are parallel to axes. An observation is misclassified when it falls in a region which does not have the same class membership. Misclassification rate in classification tree is defined as the proportion of observations classified to the wrong class while in the regression tree is defined as a mean squared error. In this paper, we present two of the important methods for estimating the misclassification (error) rate in decision trees, as we know that all classification procedures, including decision trees, can produce errors. Constructed DT model by using a training dataset and tested it based on an independent test dataset. There are several procedures for estimating the error rate of decision tree-structured classifiers, as K-fold cross-validation and bootstrap estimates. This comparison aimed to characterize the performance of the two methods in terms of test error rates based on real datasets. The results indicate that 10-fold cross-validation and bootstrap yield a tree fairly close to the best available measured by tree size.