Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

Prediction using classification techniques is one of the fundamental feature widely applied in various fields. Classification accuracy is still a great challenge due to data imbalance problem. The increased volume of data is also posing a challenge for data handling and prediction, particularly when technology is used as the interface between customers and the company. As the data imbalance increases it directly affects the classification accuracy of the entire system. AUC (area under the curve) and lift proved to be good evaluation metrics. Classification techniques help to improve classification accuracy, but in case of imbalanced dataset classification accuracy does not predict well and other techniques, such as oversampling needs to be resorted. Paper presented Voting based ensembling technique to improve classification accuracy in case of imbalanced data. The voting based ensemble is based on taking the votes on the best class obtained by the three classification techniques, namely, Logistics Regression, Classification Trees and Discriminant Analysis. The observed result revealed improvement in classification accuracy by using voting ensembling technique.

Download Full-text

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence47617.2020.9058124 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Juhi Jain

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

A G-Means Update Ensemble Learning Approach for the Imbalanced Data Stream with Concept Drifts

Big Data Analytics and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43946-4_17 ◽

2016 ◽

pp. 255-266

Author(s):

Sin-Kai Wang ◽

Bi-Ru Dai

Keyword(s):

Ensemble Learning ◽

Data Stream ◽

Imbalanced Data ◽

Learning Approach ◽

Concept Drifts

Download Full-text

Clustering-based subset ensemble learning method for imbalanced data

2013 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2013.6890440 ◽

2013 ◽

Cited By ~ 1

Author(s):

Xiao-Sheng Hu ◽

Run-Jing Zhang

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Learning Method

Download Full-text

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

Advances in Intelligent Systems and Computing - Genetic and Evolutionary Computing ◽

10.1007/978-981-13-5841-8_64 ◽

2019 ◽

pp. 615-624

Author(s):

Yan Han ◽

Mingxiang He ◽

Qixian Lu

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Data Classification ◽

Classification Algorithm ◽

Imbalanced Data Classification

Download Full-text

GMR based pain intensity recognition using imbalanced data handling techniques

2016 International Conference on Signal and Information Processing (IConSIP) ◽

10.1109/iconsip.2016.7857447 ◽

2016 ◽

Cited By ~ 1

Author(s):

Anima Majumder ◽

Laxmidher Behera ◽

Venkatesh K. Subramanian

Keyword(s):

Pain Intensity ◽

Imbalanced Data ◽

Data Handling

Download Full-text

A Survey on Imbalanced Data Handling Techniques for Classification

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/089102021 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1341-1347

Keyword(s):

Real World ◽

Imbalanced Data ◽

Learning Task ◽

High Accuracy ◽

Data Handling ◽

Imbalanced Dataset ◽

Minority Class ◽

Class Labels ◽

Very High ◽

F Measure

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.

Download Full-text

Efficacy of Imbalanced Data Handling Methods on Deep Learning for Smart Homes Environments

SN Computer Science ◽

10.1007/s42979-020-00211-1 ◽

2020 ◽

Vol 1 (4) ◽

Cited By ~ 2

Author(s):

Rebeen Ali Hamad ◽

Masashi Kimura ◽

Jens Lundström

Keyword(s):

Deep Learning ◽

Imbalanced Data ◽

Smart Homes ◽

Data Handling ◽

Handling Methods

Download Full-text

Imbalanced Data Handling in Multi-label Aspect Categorization using Oversampling and Ensemble Learning

Combining One-vs-One Decomposition and Ensemble Learning for Multi-class Imbalanced Data

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

Improving Classification Accuracy on Imbalanced Data by Ensembling Technique

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

A G-Means Update Ensemble Learning Approach for the Imbalanced Data Stream with Concept Drifts

Clustering-based subset ensemble learning method for imbalanced data

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning

GMR based pain intensity recognition using imbalanced data handling techniques

A Survey on Imbalanced Data Handling Techniques for Classification

Efficacy of Imbalanced Data Handling Methods on Deep Learning for Smart Homes Environments

Export Citation Format