A Novel Approach using Expert Knowledge on Error based Pruning
Many traditional pruning methods assume that all the datasets are equally probable and equally important, so they apply equal pruning to all the datasets. However, in real-world classification problems, all the datasets are not equal and considering equal pruning rate during pruning tends to generate a decision tree with a large size and high misclassification rate. In this paper, we present a practical algorithm to deal with the data specific classification problem when there are datasets with different properties. Another key motivation of the data specific pruning in the paper is "trading accuracy and size". A new algorithm called Expert Knowledge Based Pruning (EKBP) is proposed to solve this dilemma. We proposed to integrate error rate, missing values and expert judgment as factors for determining data specific pruning for each dataset. We show by analysis and experiments that using this pruning, we can scale both accuracy and generalisation for the tree that is generated. Moreover, the method can be very effective for high dimensional datasets. We conduct an extensive experimental study on openly available 40 real world datasets from UCI repository. In all these experiments, the proposed approach shows considerably reduction of tree size having equal or better accuracy compared to several benchmark decision tree methods that are proposed in literature.