scholarly journals A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets

Author(s):  
Muhamad Hasbullah Bin Mohd Razali ◽  
Rizauddin Bin Saian ◽  
Yap Bee Wah ◽  
Ku Ruhana Ku-Mahamud

<span>Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called hellinger-ant-tree-miner (HATM) which is inspired by ant colony optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio &lt; 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of hellinger distance. The statistical significance test shows that HATM has higher mean BACC score than ATM.</span>

Author(s):  
Giuseppe Nuti ◽  
Lluís Antoni Jiménez Rugama ◽  
Andreea-Ingrid Cross

Bayesian Decision Trees provide a probabilistic framework that reduces the instability of Decision Trees while maintaining their explainability. While Markov Chain Monte Carlo methods are typically used to construct Bayesian Decision Trees, here we provide a deterministic Bayesian Decision Tree algorithm that eliminates the sampling and does not require a pruning step. This algorithm generates the greedy-modal tree (GMT) which is applicable to both regression and classification problems. We tested the algorithm on various benchmark classification data sets and obtained similar accuracies to other known techniques. Furthermore, we show that we can statistically analyze how was the GMT derived from the data and demonstrate this analysis with a financial example. Notably, the GMT allows for a technique that provides explainable simpler models which is often a prerequisite for applications in finance or the medical industry.


2018 ◽  
Vol 422 ◽  
pp. 242-256 ◽  
Author(s):  
Fenglian Li ◽  
Xueying Zhang ◽  
Xiqian Zhang ◽  
Chunlei Du ◽  
Yue Xu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document