Adaptive Hybrid Sampling Algorithm Based on BIRCH Clustering

Author(s):  
Xuanrui Xiong ◽  
Yang Huang ◽  
Yuan Zhang ◽  
Fan Zhang ◽  
Yumei Jia ◽  
...  
Author(s):  
Khyati Ahlawat ◽  
Anuradha Chug ◽  
Amit Prakash Singh

The uneven distribution of classes in any dataset poses a tendency of biasness toward the majority class when analyzed using any standard classifier. The instances of the significant class being deficient in numbers are generally ignored and their correct classification which is of paramount interest is often overlooked in calculating overall accuracy. Therefore, the conventional machine learning approaches are rigorously refined to address this class imbalance problem. This challenge of imbalanced classes is more prevalent in big data scenario due to its high volume. This study deals with acknowledging a sampling solution based on cluster computing in handling class imbalance problems in the case of big data. The newly proposed approach hybrid sampling algorithm (HSA) is assessed using three popular classification algorithms namely, support vector machine, decision tree and k-nearest neighbor based on balanced accuracy and elapsed time. The results obtained from the experiment are considered promising with an efficiency gain of 42% in comparison to the traditional sampling solution synthetic minority oversampling technique (SMOTE). This work proves the effectiveness of the distribution and clustering principle in imbalanced big data scenarios.


2010 ◽  
Vol 24 (1) ◽  
pp. 10-16 ◽  
Author(s):  
Hang Qiu ◽  
Leiting Chen ◽  
Jim X Chen
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document