scholarly journals Random Forest Algorithm Based on GAN for Imbalanced Data Classification

2020 ◽  
Vol 1544 ◽  
pp. 012014
Author(s):  
Qihui Shu ◽  
Tianyu Hu ◽  
Song Liu
2013 ◽  
Vol 7 (1) ◽  
pp. 62-70 ◽  
Author(s):  
Dengju Yao ◽  
Jing Yang ◽  
Xiaojuan Zhan

The classification problem is one of the important research subjects in the field of machine learning. However, most machine learning algorithms train a classifier based on the assumption that the number of training examples of classes is almost equal. When a classifier was trained on imbalanced data, the performance of the classifier declined clearly. For resolving the class-imbalanced problem, an improved random forest algorithm was proposed based on sampling with replacement. We extracted multiple example subsets randomly with replacement from majority class, and the example number of extracted example subsets is as the same with minority class example dataset. Then, multiple new training datasets were constructed by combining the each exacted majority example subset and minority class dataset respectively, and multiple random forest classifiers were training on these training dataset. For a prediction example, the class was determined by majority voting of multiple random forest classifiers. The experimental results on five groups UCI datasets and a real clinical dataset show that the proposed method could deal with the class-imbalanced data problem and the improved random forest algorithm outperformed original random forest and other methods in literatures.


Author(s):  
Qi Han ◽  
Rui Yang ◽  
Zitong Wan ◽  
Shaozhi Chen ◽  
Mengjie Huang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document