Variance Ranking for Multi-Classed Imbalanced Datasets: A Case Study of One-Versus-All

Imbalanced classes in multi-classed datasets is one of the most salient hindrances to the accuracy and dependable results of predictive modeling. In predictions, there are always majority and minority classes, and in most cases it is difficult to capture the members of item belonging to the minority classes. This anomaly is traceable to the designs of the predictive algorithms because most algorithms do not factor in the unequal numbers of classes into their designs and implementations. The accuracy of most modeling processes is subjective to the ever-present consequences of the imbalanced classes. This paper employs the variance ranking technique to deal with the real-world class imbalance problem. We augmented this technique using one-versus-all re-coding of the multi-classed datasets. The proof-of-concept experimentation shows that our technique performs better when compared with the previous work done on capturing small class members in multi-classed datasets.

Download Full-text

Convolutional neural networks based focal loss for class imbalance problem: a case study of canine red blood cells morphology classification

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-01773-x ◽

2020 ◽

Cited By ~ 6

Author(s):

Kitsuchart Pasupa ◽

Supawit Vatathanavaro ◽

Suchat Tungjitnob

Keyword(s):

Neural Networks ◽

Red Blood Cells ◽

Convolutional Neural Networks ◽

Blood Cells ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

Applied Computer Systems ◽

10.2478/acss-2019-0013 ◽

2019 ◽

Vol 24 (2) ◽

pp. 104-110

Author(s):

Duygu Sinanc Terzi ◽

Seref Sagiroglu

Keyword(s):

Big Data ◽

Class Imbalance ◽

Area Under The Curve ◽

Data Sets ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

The Common ◽

Public Datasets ◽

Distributed Cluster

Abstract The class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design aims at modifying the existing dataset to increase the classification success. Within the study, DIBID has been implemented on public datasets under two strategies. The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios. The second strategy has been designed to compare the success of the model with other imbalanced big data solutions in the literature. According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study.

Download Full-text

An Insight on the Class Imbalance Problem and Its Solutions in Big Data

Large-Scale Data Streaming, Processing, and Blockchain Security - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-3444-1.ch002 ◽

2021 ◽

pp. 39-49

Author(s):

Khyati Ahlawat ◽

Anuradha Chug ◽

Amit Prakash Singh

Keyword(s):

Machine Learning ◽

Big Data ◽

Class Imbalance ◽

Classification Problem ◽

Correct Classification ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Methods And Techniques ◽

Conventional Machine ◽

Work Done

Expansion of data in the dimensions of volume, variety, or velocity is leading to big data. Learning from this big data is challenging and beyond capacity of conventional machine learning methods and techniques. Generally, big data getting generated from real-time scenarios is imbalance in nature with uneven distribution of classes. This imparts additional complexity in learning from big data since the class that is underrepresented is more influential and its correct classification becomes critical than that of overrepresented class. This chapter addresses the imbalance problem and its solutions in context of big data along with a detailed survey of work done in this area. Subsequently, it also presents an experimental view for solving imbalance classification problem and a comparative analysis between different methodologies afterwards.

Download Full-text

A Novel Oversampling Technique to Solve Class Imbalance Problem: A Case Study of Students’ Grades Evaluation

10.1109/contesa52813.2021.9657151 ◽

2021 ◽

Author(s):

Dilshad Jahin ◽

Israt Jahan Emu ◽

Subrina Akter ◽

Muhammed J.A. Patwary ◽

Mohammad Arif Sobhan Bhuiyan ◽

...

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

Applying separately cost-sensitive learning and Fisher's discriminant analysis to address the class imbalance problem: A case study involving a virtual gas pipeline SCADA system

International Journal of Critical Infrastructure Protection ◽

10.1016/j.ijcip.2020.100357 ◽

2020 ◽

Vol 29 ◽

pp. 100357

Author(s):

Abouzar Choubineh ◽

David A. Wood ◽

Zahak Choubineh

Keyword(s):

Discriminant Analysis ◽

Class Imbalance ◽

Gas Pipeline ◽

Class Imbalance Problem ◽

Cost Sensitive Learning ◽

Imbalance Problem ◽

Scada System ◽

Fisher’S Discriminant Analysis

Download Full-text

KGEARSRG: Kernel Graph Embedding on Attributed Relational SIFT-Based Regions Graph

Machine Learning and Knowledge Extraction ◽

10.3390/make1030055 ◽

2019 ◽

Vol 1 (3) ◽

pp. 962-973 ◽

Cited By ~ 1

Author(s):

Mario Manzo

Keyword(s):

Binary Classification ◽

Class Imbalance ◽

Graph Embedding ◽

Support Vector ◽

Class Imbalance Problem ◽

Scale Invariant ◽

Imbalance Problem ◽

Series Of Experiments ◽

Imbalanced Classes ◽

Scale Invariant Feature

In real world applications, binary classification is often affected by imbalanced classes. In this paper, a new methodology to solve the class imbalance problem that occurs in image classification is proposed. A digital image is described through a novel vector-based representation called Kernel Graph Embedding on Attributed Relational Scale-Invariant Feature Transform-based Regions Graph (KGEARSRG). A classification stage using a procedure based on support vector machines (SVMs) is organized. Methodology is evaluated through a series of experiments performed on art painting dataset images, affected by varying imbalance percentages. Experimental results show that the proposed approach consistently outperforms the competitors.

Download Full-text

An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

10.26594/register.v7i1.2206 ◽

2021 ◽

Vol 7 (1) ◽

pp. 63

Author(s):

Prasetyo Wibowo ◽

Chastine Fatichah

Keyword(s):

Random Forest ◽

Performance Analysis ◽

Nearest Neighbor ◽

Class Imbalance ◽

K Nearest Neighbor ◽

Class Imbalance Problem ◽

Minority Class ◽

High Class ◽

Imbalance Problem ◽

Imbalanced Classes

Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class. Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique.

Download Full-text

Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

IEEE Access ◽

10.1109/access.2016.2619719 ◽

2016 ◽

Vol 4 ◽

pp. 7940-7957 ◽

Cited By ~ 71

Author(s):

Adnan Amin ◽

Sajid Anwar ◽

Awais Adnan ◽

Muhammad Nawaz ◽

Newton Howard ◽

...

Keyword(s):

Class Imbalance ◽

Churn Prediction ◽

Class Imbalance Problem ◽

Customer Churn ◽

Imbalance Problem ◽

Customer Churn Prediction

Download Full-text

A Comparison of Two Oversampling Techniques (SMOTE vs MTDF) for Handling Class Imbalance Problem: A Case Study of Customer Churn Prediction

New Contributions in Information Systems and Technologies - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-16486-1_22 ◽

2015 ◽

pp. 215-225 ◽

Cited By ~ 5

Author(s):

Adnan Amin ◽

Faisal Rahim ◽

Imtiaz Ali ◽

Changez Khan ◽

Sajid Anwar

Keyword(s):

Class Imbalance ◽

Churn Prediction ◽

Class Imbalance Problem ◽

Customer Churn ◽

Imbalance Problem ◽

Customer Churn Prediction

Download Full-text

Detection of Myocardial Infarction Using ECG and Multi-Scale Feature Concatenate

Sensors ◽

10.3390/s21051906 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1906

Author(s):

Jia-Zheng Jian ◽

Tzong-Rong Ger ◽

Han-Hua Lai ◽

Chi-Ming Ku ◽

Chiung-An Chen ◽

...

Keyword(s):

Myocardial Infarction ◽

Network Structure ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Multi Scale ◽

Imbalance Problem ◽

Average Accuracy ◽

Significant Difference ◽

Electrocardiogram Ecg

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.

Download Full-text