GMR based pain intensity recognition using imbalanced data handling techniques

Author(s):  
Anima Majumder ◽  
Laxmidher Behera ◽  
Venkatesh K. Subramanian
2017 ◽  
Vol 19 (1) ◽  
pp. 42-49
Author(s):  
Divya Agrawal ◽  
Padma Bonde

Prediction using classification techniques is one of the fundamental feature widely applied in various fields. Classification accuracy is still a great challenge due to data imbalance problem. The increased volume of data is also posing a challenge for data handling and prediction, particularly when technology is used as the interface between customers and the company. As the data imbalance increases it directly affects the classification accuracy of the entire system. AUC (area under the curve) and lift proved to be good evaluation metrics. Classification techniques help to improve classification accuracy, but in case of imbalanced dataset classification accuracy does not predict well and other techniques, such as oversampling needs to be resorted. Paper presented Voting based ensembling technique to improve classification accuracy in case of imbalanced data. The voting based ensemble is based on taking the votes on the best class obtained by the three classification techniques, namely, Logistics Regression, Classification Trees and Discriminant Analysis. The observed result revealed improvement in classification accuracy by using voting ensembling technique.


Classification is a supervised learning task based on categorizing things in groups on the basis of class labels. Algorithms are trained with labeled datasets for accomplishing the task of classification. In the process of classification, datasets plays an important role. If in a dataset, instances of one label/class (majority class) are much more than instances of another label/class (minority class), such that it becomes hard to understand and learn characteristics of minority class for a classifier, such dataset is termed an imbalanced dataset. These types of datasets raise the problem of biased prediction or misclassification in the real world, as models based on such datasets may give very high accuracy during training, but as not familiar with minority class instances, would not be able to predict minority class and thus fails poorly. A survey on various techniques proposed by the researchers for handling imbalanced data has been presented and a comparison of the techniques based on f-measure has been identified and discussed.


2017 ◽  
Vol 88 ◽  
pp. 45-57 ◽  
Author(s):  
Sasiporn Tongman ◽  
Suchart Chanama ◽  
Manee Chanama ◽  
Kitiporn Plaimas ◽  
Chidchanok Lursinsap

2019 ◽  
Vol 2019 ◽  
pp. 1-8 ◽  
Author(s):  
Xue-Zhen Hong ◽  
Xian-Shu Fu ◽  
Zheng-Liang Wang ◽  
Li Zhang ◽  
Xiao-Ping Yu ◽  
...  

This work presents a reliable approach to trace teas’ geographical origins despite changes in teas caused by different harvest years. A total of 1447 tea samples collected from various areas in 2014 (660 samples) and 2015 (787 samples) were detected by FT-NIR. Seven classifiers trained on the 2014 dataset all succeeded to trace origins of samples collected in 2014; however, they all failed to predict origins for the 2015 samples due to different data distributions and imbalanced dataset. Three outlier detection based undersampling approaches—one-class SVM (OC-SVM), isolation forest and elliptic envelope—were then proposed; as a result, the highest macro average recall (MAR) for the 2015 dataset was improved from 56.86% to 73.95% (by SVM). A model updating approach was also applied, and the prediction MAR was significantly improved with increase in the updating rate. The best MAR (90.31%) was first achieved by the OC-SVM combined SVM classifier at a 50% rate.


2020 ◽  
Vol 146 (5) ◽  
pp. 411-450 ◽  
Author(s):  
Tobias Markfelder ◽  
Paul Pauli

Sign in / Sign up

Export Citation Format

Share Document