GMR based pain intensity recognition using imbalanced data handling techniques

Prediction using classification techniques is one of the fundamental feature widely applied in various fields. Classification accuracy is still a great challenge due to data imbalance problem. The increased volume of data is also posing a challenge for data handling and prediction, particularly when technology is used as the interface between customers and the company. As the data imbalance increases it directly affects the classification accuracy of the entire system. AUC (area under the curve) and lift proved to be good evaluation metrics. Classification techniques help to improve classification accuracy, but in case of imbalanced dataset classification accuracy does not predict well and other techniques, such as oversampling needs to be resorted. Paper presented Voting based ensembling technique to improve classification accuracy in case of imbalanced data. The voting based ensemble is based on taking the votes on the best class obtained by the three classification techniques, namely, Logistics Regression, Classification Trees and Discriminant Analysis. The observed result revealed improvement in classification accuracy by using voting ensembling technique.

Download Full-text

A Survey on Imbalanced Data Handling Techniques for Classification

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/089102021 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1341-1347

Keyword(s):

Real World ◽

Imbalanced Data ◽

Learning Task ◽

High Accuracy ◽

Data Handling ◽

Imbalanced Dataset ◽

Minority Class ◽

Class Labels ◽

Very High ◽

F Measure

Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.

Download Full-text

Efficacy of Imbalanced Data Handling Methods on Deep Learning for Smart Homes Environments

SN Computer Science ◽

10.1007/s42979-020-00211-1 ◽

2020 ◽

Vol 1 (4) ◽

Cited By ~ 2

Author(s):

Rebeen Ali Hamad ◽

Masashi Kimura ◽

Jens Lundström

Keyword(s):

Deep Learning ◽

Imbalanced Data ◽

Smart Homes ◽

Data Handling ◽

Handling Methods

Download Full-text

Metabolic pathway synthesis based on predicting compound transformable pairs by using neural classifiers with imbalanced data handling

Expert Systems with Applications ◽

10.1016/j.eswa.2017.06.026 ◽

2017 ◽

Vol 88 ◽

pp. 45-57 ◽

Cited By ~ 1

Author(s):

Sasiporn Tongman ◽

Suchart Chanama ◽

Manee Chanama ◽

Kitiporn Plaimas ◽

Chidchanok Lursinsap

Keyword(s):

Metabolic Pathway ◽

Imbalanced Data ◽

Data Handling

Download Full-text

A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique

International Journal of Recent Trends in Engineering and Research ◽

10.23883/ijrter.2017.3168.0uwxm ◽

2017 ◽

Vol 3 (4) ◽

pp. 444-449 ◽

Cited By ~ 4

Keyword(s):

Imbalanced Data ◽

Data Handling

Download Full-text

Feature Selection and Imbalanced Data Handling for Depression Detection

Brain Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-030-05587-5_33 ◽

2018 ◽

pp. 349-358

Author(s):

Marzieh Mousavian ◽

Jianhua Chen ◽

Steven Greening

Keyword(s):

Feature Selection ◽

Imbalanced Data ◽

Data Handling ◽

Depression Detection

Download Full-text

Tracing Geographical Origins of Teas Based on FT-NIR Spectroscopy: Introduction of Model Updating and Imbalanced Data Handling Approaches

Journal of Analytical Methods in Chemistry ◽

10.1155/2019/1537568 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Xue-Zhen Hong ◽

Xian-Shu Fu ◽

Zheng-Liang Wang ◽

Li Zhang ◽

Xiao-Ping Yu ◽

...

Keyword(s):

Outlier Detection ◽

Model Updating ◽

Imbalanced Data ◽

Nir Spectroscopy ◽

Svm Classifier ◽

Data Handling ◽

Imbalanced Dataset ◽

Average Recall ◽

Isolation Forest

This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.

Download Full-text