Similarity Majority Under-Sampling Technique for Easing Imbalanced Classification Problem

In this work, we propose a combined sampling technique to improve the performance of imbalanced classification of university student depression data. In experimental results, we found that combined random oversampling with the Tomek links under sampling methods allowed generating a relatively balanced depression dataset without losing significant information. In this case, the random oversampling technique was used for sampling the minority class to balance the number of samples between the datasets. Then, the Tomek links technique was used for undersampling the samples by removing the depression data considered less relevant and noisy. The relatively balanced dataset was classified by random forest. The results show that the overall accuracy in the prediction of adolescent depression data was 94.17%, outperforming the individual sampling technique. Moreover, our proposed method was tested with another dataset for its external validity. This dataset’s predictive accuracy was found to be 93.33%.

Download Full-text

Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model

Current Proteomics ◽

10.2174/1570164618999210101222637 ◽

2021 ◽

Vol 18 ◽

Author(s):

Min Liu ◽

Lu Zhang ◽

Xinyi Qin ◽

Tao Huang ◽

Ziwei Xu ◽

...

Keyword(s):

Prediction Model ◽

Protein Sequence ◽

Cross Validation ◽

Transplant Rejection ◽

Sampling Technique ◽

Single Type ◽

Post Translational Modification ◽

Tyrosine Residues ◽

Under Sampling ◽

Fusion Features

Background: Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. Objective: It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. Methods: In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results and Conclusion: Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.

Download Full-text

Research of Under-Sampling Technique for Digital Image Correlation in Vibration Measurement

Shock & Vibration, Aircraft/Aerospace, Energy Harvesting, Acoustics & Optics, Volume 9 - Conference Proceedings of the Society for Experimental Mechanics Series ◽

10.1007/978-3-319-54735-0_6 ◽

2017 ◽

pp. 49-58 ◽

Cited By ~ 4

Author(s):

Yihao Liu ◽

Hongjian Gao ◽

James Zhuge ◽

Jeff Zhao

Keyword(s):

Digital Image Correlation ◽

Digital Image ◽

Sampling Technique ◽

Image Correlation ◽

Vibration Measurement ◽

Under Sampling

Download Full-text

Multiple weak supervision for short text classification

Applied Intelligence ◽

10.1007/s10489-021-02958-3 ◽

2022 ◽

Author(s):

Li-Ming Chen ◽

Bao-Xin Xiu ◽

Zhao-Yun Ding

Keyword(s):

Text Classification ◽

Classification Problem ◽

Experimental Results ◽

Prior Work ◽

Weak Supervision ◽

Short Text ◽

Imbalanced Classification ◽

Distant Supervision ◽

Synthetic Datasets ◽

Independent Model

AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.

Download Full-text

An under-sampling technique for imbalanced data classification based on DBSCAN algorithm

2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS) ◽

10.1109/cfis49607.2020.9238718 ◽

2020 ◽

Author(s):

Behzad Mirzaei ◽

Bahareh Nikpour ◽

Hossein Nezamabadi-Pour

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Sampling Technique ◽

Dbscan Algorithm ◽

Imbalanced Data Classification ◽

Under Sampling

Download Full-text

SVGPM: evolving SVM decision function by using genetic programming to solve imbalanced classification problem

Progress in Artificial Intelligence ◽

10.1007/s13748-021-00260-4 ◽

2021 ◽

Author(s):

Muhammad Syafiq Mohd Pozi ◽

Nur Athirah Azhar ◽

Abdul Rafiez Abdul Raziff ◽

Lina Hazmi Ajrina

Keyword(s):

Genetic Programming ◽

Classification Problem ◽

Decision Function ◽

Imbalanced Classification

Download Full-text

An efficient and simple under-sampling technique for imbalanced time series classification

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12 ◽

10.1145/2396761.2398635 ◽

2012 ◽

Cited By ~ 4

Author(s):

Guohua Liang ◽

Chengqi Zhang

Keyword(s):

Time Series ◽

Sampling Technique ◽

Time Series Classification ◽

Under Sampling

Download Full-text

CUSBoost: Cluster-Based Under-Sampling with Boosting for Imbalanced Classification

2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS) ◽

10.1109/csitss.2017.8447534 ◽

2017 ◽

Cited By ~ 6

Author(s):

Farshid Rayhan ◽

Sajid Ahmed ◽

Asif Mahbub ◽

Rafsan Jani ◽

Swakkhar Shatabda ◽

...

Keyword(s):

Imbalanced Classification ◽

Under Sampling

Download Full-text

A Balance Adjusting Approach of Extended Belief-Rule-Based System for Imbalanced Classification Problem

IEEE Access ◽

10.1109/access.2020.2976708 ◽

2020 ◽

Vol 8 ◽

pp. 41201-41212 ◽

Cited By ~ 1

Author(s):

Weijie Fang ◽

Xiaoting Gong ◽

Genggeng Liu ◽

Yingjie Wu ◽

Yanggeng Fu

Keyword(s):

Classification Problem ◽

Rule Based ◽

Imbalanced Classification ◽

Rule Based System

Download Full-text

Comprehensive Assessment of Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7349.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1426-1431

Keyword(s):

Real World ◽

Imbalanced Data ◽

Predictive Modelling ◽

Classification Problem ◽

Minority Class ◽

Unequal Distribution ◽

Imbalanced Classification ◽

Imbalanced Data Classification ◽

Improved Performance ◽

And Performance

This is an attempt to address the various challenges opportunities and scope for formulating and designing new procedure in imbalanced classification problem which poses a challenge to a predictive modelling as many of AI ML n DL algorithms which are extensively used for classification are always designed from the perspective of with majority of focus on assuming equal number of examples for a class. It leads to poor efficiency and performance especially in minority class. As Minority class is always very crucial and sensitive to classification errors and also its utmost important in imbalanced classification. This chapter discusses addresses and gives novel as well as deep insights with unequal distribution of classes in training datasets. Largely real time and real world classifications are comprising imbalanced distribution so need specialized techniques for more challenging and sophisticated models with minimal errors and improved performance.

Download Full-text