Minority–Majority Mix mean Oversampling Technique: An Efficient Technique to Improve Classification of Imbalanced Data Sets

Author(s):  
Sachin Patil ◽  
Shefali Sonavane
2013 ◽  
Vol 42 ◽  
pp. 97-110 ◽  
Author(s):  
Alberto Fernández ◽  
Victoria López ◽  
Mikel Galar ◽  
María José del Jesus ◽  
Francisco Herrera

Author(s):  
Ghulam Fatima ◽  
Sana Saeed

In the data mining communal, imbalanced class dispersal data sets have established mounting consideration. The evolving field of data mining and information discovery seeks to establish precise and effective computational tools for the investigation of such data sets to excerpt innovative facts from statistics. Sampling methods re-balance the imbalanced data sets consequently improve the enactment of classifiers. For the classification of the imbalanced data sets, over-fitting and under-fitting are the two striking problems. In this study, a novel weighted ensemble method is anticipated to diminish the influence of over-fitting and under-fitting while classifying these kinds of data sets. Forty imbalanced data sets with varying imbalance ratios are engaged to conduct a comparative study. The enactment of the projected method is compared with four customary classifiers including decision tree(DT), k-nearest neighbor (KNN), support vector machines (SVM), and neural network (NN). This evaluation is completed with two over-sampling procedures, an adaptive synthetic sampling approach (ADASYN), and a synthetic minority over-sampling (SMOTE) technique. The projected scheme remained efficacious in diminishing the impact of over-fitting and under-fitting on the classification of these data sets.


2012 ◽  
Vol 546-547 ◽  
pp. 622-627
Author(s):  
Wei Mei Zhi ◽  
Hua Ping Guo ◽  
Ming Fan

Most classifiers lose efficiency with the problem of imbalanced class distribution, which, however, often shows statistical significant in practice. Therefore, the problem of learning from imbalanced datasets has attracted growing attention in recent years. The paper provide a comprehensive review of the classification of imbalanced datasets, the nature of the problem, the factor which affected the problem, the current assessment metrics used to evaluate learning performance, as well as the opportunities and challenges in the learning from imbalanced data.


Information ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 557
Author(s):  
Alexandre M. de Carvalho ◽  
Ronaldo C. Prati

One of the significant challenges in machine learning is the classification of imbalanced data. In many situations, standard classifiers cannot learn how to distinguish minority class examples from the others. Since many real problems are unbalanced, this problem has become very relevant and deeply studied today. This paper presents a new preprocessing method based on Delaunay tessellation and the preprocessing algorithm SMOTE (Synthetic Minority Over-sampling Technique), which we call DTO-SMOTE (Delaunay Tessellation Oversampling SMOTE). DTO-SMOTE constructs a mesh of simplices (in this paper, we use tetrahedrons) for creating synthetic examples. We compare results with five preprocessing algorithms (GEOMETRIC-SMOTE, SVM-SMOTE, SMOTE-BORDERLINE-1, SMOTE-BORDERLINE-2, and SMOTE), eight classification algorithms, and 61 binary-class data sets. For some classifiers, DTO-SMOTE has higher performance than others in terms of Area Under the ROC curve (AUC), Geometric Mean (GEO), and Generalized Index of Balanced Accuracy (IBA).


Sign in / Sign up

Export Citation Format

Share Document