Imbalanced data classification algorithm based on boosting and cascade model

Author(s):  
Xiaolong Zhang ◽  
Chao Cheng
2018 ◽  
Vol 12 (3) ◽  
pp. 341-347 ◽  
Author(s):  
Feng Wang ◽  
Shaojiang Liu ◽  
Weichuan Ni ◽  
Zhiming Xu ◽  
Zemin Qiu ◽  
...  

Author(s):  
Bo Huang ◽  
Yimin Zhu ◽  
Zhongzhen Wang ◽  
Zhijun Fang

The class-imbalance learning is one of the most significant research topics in the data mining and machine learning. Imbalance problem means that one of the classes has much more samples than that of other classes. To deal with the issues of low classification accuracy and high time complexity, this paper proposes an novel imbalance data classification algorithm based on clustering and SVM. The algorithm suggests under-sampling in majority samples based on the distribution characteristics of minority samples. First, specific clusters are detected by cluster analysis on the minority. Second, a cluster boundary strategy is proposed to eliminate the bad influence of noise samples. To structure a balanced dataset for imbalance data, this paper proposes three principles of under-sampling on majority samples according to the characteristic of samples in the cluster. Finally, the optimal classification model from the linear combination of hybrid-kernel SVM is obtained. The experiments based on datasets in UCI and KEEL database show that our algorithm effectively decreases the interference of noise samples. Compared with the SMOTE and Fast-CBUS, the proposed algorithm not only reduces the feature dimension, but also improves the precision of the minor classes under the different labeled sample rates generally.


Sign in / Sign up

Export Citation Format

Share Document