Improvement in Boosting Method by Using RUSTBoost Technique for Class Imbalanced Data

Credibility Based Imbalance Boosting Method for Software Defect Proneness Prediction

Applied Sciences ◽

10.3390/app10228059 ◽

2020 ◽

Vol 10 (22) ◽

pp. 8059

Author(s):

Haonan Tong ◽

Shihai Wang ◽

Guangling Li

Keyword(s):

Class Imbalance ◽

Area Under The Curve ◽

Imbalanced Data ◽

Class Imbalance Problem ◽

Synthetic Sample ◽

Promising Alternative ◽

Imbalance Problem ◽

Software Defect ◽

Boosting Method ◽

High Credibility

Imbalanced data are a major factor for degrading the performance of software defect models. Software defect dataset is imbalanced in nature, i.e., the number of non-defect-prone modules is far more than that of defect-prone ones, which results in the bias of classifiers on the majority class samples. In this paper, we propose a novel credibility-based imbalance boosting (CIB) method in order to address the class-imbalance problem in software defect proneness prediction. The method measures the credibility of synthetic samples based on their distribution by introducing a credit factor to every synthetic sample, and proposes a weight updating scheme to make the base classifiers focus on synthetic samples with high credibility and real samples. Experiments are performed on 11 NASA datasets and nine PROMISE datasets by comparing CIB with MAHAKIL, AdaC2, AdaBoost, SMOTE, RUS, No sampling method in terms of four performance measures, i.e., area under the curve (AUC), F1, AGF, and Matthews correlation coefficient (MCC). Wilcoxon sign-ranked test and Cliff’s δ are separately used to perform statistical test and calculate effect size. The experimental results show that CIB is a more promising alternative for addressing the class-imbalance problem in software defect-prone prediction as compared with previous methods.

Download Full-text

K-Means Cluster Based Undersampling Ensemble for Imbalanced Data Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5188.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2074-2079

Keyword(s):

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Classification Algorithms ◽

Classification Problems ◽

Imbalanced Classification ◽

Imbalanced Data Classification ◽

Traditional Classification ◽

Boosting Method ◽

Ensemble Algorithms

Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.

Download Full-text

A multi-class boosting method for learning from imbalanced data

International Journal of Granular Computing Rough Sets and Intelligent Systems ◽

10.1504/ijgcrsis.2015.074722 ◽

2015 ◽

Vol 4 (1) ◽

pp. 13 ◽

Cited By ~ 9

Author(s):

Xiaohui Yuan ◽

Mohamed Abouelenien

Keyword(s):

Imbalanced Data ◽

Boosting Method

Download Full-text

Handling Imbalanced Data for Real-Time Crash Prediction: Application of Boosting and Sampling Techniques

Journal of Transportation Engineering Part A Systems ◽

10.1061/jtepbs.0000499 ◽

2021 ◽

Vol 147 (3) ◽

pp. 04020165

Author(s):

Amin Ariannezhad ◽

Abolfazl Karimpour ◽

Xiao Qin ◽

Yao-Jan Wu ◽

Yasamin Salmani

Keyword(s):

Real Time ◽

Imbalanced Data ◽

Sampling Techniques ◽

Crash Prediction

Download Full-text

GRADIENT BOOSTING METHOD APPLICATION TO SUPPORT PROCESS DECISIONS IN THE ELECTRON-BEAM WELDING PROCESS

Siberian Journal of Science and Technology ◽

10.31772/2587-6066-2020-21-2-206-214 ◽

2020 ◽

Vol 21 (2) ◽

pp. 206-214

Author(s):

V. S. Tynchenko ◽

◽

I. A. Golovenok ◽

V. E. Petrenko ◽

A. V. Milov ◽

...

Keyword(s):

Electron Beam ◽

Electron Beam Welding ◽

Welding Process ◽

Gradient Boosting ◽

Boosting Method

Download Full-text

Un-Supervised approach to backorder prediction using deep autoencoder

Recent Patents on Computer Science ◽

10.2174/2213275912666190819112609 ◽

2019 ◽

Vol 12 ◽

Author(s):

Gunjan Saraogi ◽

Deepa Gupta ◽

Lavanya Sharma ◽

Ajay Rana

Keyword(s):

Anomaly Detection ◽

Predictive Model ◽

Inventory Management ◽

Supervised Classification ◽

Operating Characteristic ◽

Imbalanced Data ◽

Bullwhip Effect ◽

Detection Methods ◽

Good Representation ◽

Operational Management

Background: Backorders are an accepted abnormality affecting accumulation alternation and logistics, sales, chump service, and manufacturing, which generally leads to low sales and low chump satisfaction. A predictive archetypal can analyse which articles are best acceptable to acquaintance backorders giving the alignment advice and time to adjust, thereby demography accomplishes to aerate their profit. Objective: To address the issue of predicting backorders, this paper has proposed an un-supervised approach to backorder prediction using Deep Autoencoder. Method: In this paper, artificial intelligence paradigms are researched in order to introduce a predictive model for the present unbalanced data issues, where the number of products going on backorder is rare. Result: Un-supervised anomaly detection using deep auto encoders has shown better Area under the Receiver Operating Characteristic and precision-recall curves than supervised classification techniques employed with resampling techniques for imbalanced data problems. Conclusion: We demonstrated that Un-supervised anomaly detection methods specifically deep auto-encoders can be used to learn a good representation of the data. The method can be used as predictive model for inventory management and help to reduce bullwhip effect, raise customer satisfaction as well as improve operational management in the organization. This technology is expected to create the sentient supply chain of the future – able to feel, perceive and react to situations at an extraordinarily granular level

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

Cluster-Wise Weighted NMF for Hyperspectral Images Unmixing with Imbalanced Data

Remote Sensing ◽

10.3390/rs13020268 ◽

2021 ◽

Vol 13 (2) ◽

pp. 268

Author(s):

Xiaochen Lv ◽

Wenhong Wang ◽

Hongfu Liu

Keyword(s):

Spatial Information ◽

Hyperspectral Image ◽

Imbalanced Data ◽

Reconstruction Error ◽

Hyperspectral Data ◽

Weight Matrix ◽

Hyperspectral Images ◽

Mixed Data ◽

Sparsity Constraints ◽

Additional Constraints

Hyperspectral unmixing is an important technique for analyzing remote sensing images which aims to obtain a collection of endmembers and their corresponding abundances. In recent years, non-negative matrix factorization (NMF) has received extensive attention due to its good adaptability for mixed data with different degrees. The majority of existing NMF-based unmixing methods are developed by incorporating additional constraints into the standard NMF based on the spectral and spatial information of hyperspectral images. However, they neglect to exploit the nature of imbalanced pixels included in the data, which may cause the pixels mixed with imbalanced endmembers to be ignored, and thus the imbalanced endmembers generally cannot be accurately estimated due to the statistical property of NMF. To exploit the information of imbalanced samples in hyperspectral data during the unmixing procedure, in this paper, a cluster-wise weighted NMF (CW-NMF) method for the unmixing of hyperspectral images with imbalanced data is proposed. Specifically, based on the result of clustering conducted on the hyperspectral image, we construct a weight matrix and introduce it into the model of standard NMF. The proposed weight matrix can provide an appropriate weight value to the reconstruction error between each original pixel and the reconstructed pixel in the unmixing procedure. In this way, the adverse effect of imbalanced samples on the statistical accuracy of NMF is expected to be reduced by assigning larger weight values to the pixels concerning imbalanced endmembers and giving smaller weight values to the pixels mixed by majority endmembers. Besides, we extend the proposed CW-NMF by introducing the sparsity constraints of abundance and graph-based regularization, respectively. The experimental results on both synthetic and real hyperspectral data have been reported, and the effectiveness of our proposed methods has been demonstrated by comparing them with several state-of-the-art methods.

Download Full-text