scholarly journals Trainable Undersampling for Class-Imbalance Learning

Author(s):  
Minlong Peng ◽  
Qi Zhang ◽  
Xiaoyu Xing ◽  
Tao Gui ◽  
Xuanjing Huang ◽  
...  

Undersampling has been widely used in the class-imbalance learning area. The main deficiency of most existing undersampling methods is that their data sampling strategies are heuristic-based and independent of the used classifier and evaluation metric. Thus, they may discard informative instances for the classifier during the data sampling. In this work, we propose a meta-learning method built on the undersampling to address this issue. The key idea of this method is to parametrize the data sampler and train it to optimize the classification performance over the evaluation metric. We solve the non-differentiable optimization problem for training the data sampler via reinforcement learning. By incorporating evaluation metric optimization into the data sampling process, the proposed method can learn which instance should be discarded for the given classifier and evaluation metric. In addition, as a data level operation, this method can be easily applied to arbitrary evaluation metric and classifier, including non-parametric ones (e.g., C4.5 and KNN). Experimental results on both synthetic and realistic datasets demonstrate the effectiveness of the proposed method.

Author(s):  
Sajad Emamipour ◽  
Rasoul Sali ◽  
Zahra Yousefi

This article describes how class imbalance learning has attracted great attention in recent years as many real world domain applications suffer from this problem. Imbalanced class distribution occurs when the number of training examples for one class far surpasses the training examples of the other class often the one that is of more interest. This problem may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. Toward this end, the authors developed a hybrid model to address the class imbalance learning with focus on binary class problems. This model combines benefits of the ensemble classifiers with a multi objective feature selection technique to achieve higher classification performance. The authors' model also proposes non-dominated sets of features. Then they evaluate the performance of the proposed model by comparing its results with notable algorithms for solving imbalanced data problem. Finally, the authors utilize the proposed model in medical domain of predicting life expectancy in post-operative of thoracic surgery patients.


Sign in / Sign up

Export Citation Format

Share Document