Improvement in Boosting Method by Using RUSTBoost Technique for Class Imbalanced Data

Author(s):  
Ashutosh Kumar ◽  
Roshan Bharti ◽  
Deepak Gupta ◽  
Anish Kumar Saha
2020 ◽  
Vol 10 (22) ◽  
pp. 8059
Author(s):  
Haonan Tong ◽  
Shihai Wang ◽  
Guangling Li

Imbalanced data are a major factor for degrading the performance of software defect models. Software defect dataset is imbalanced in nature, i.e., the number of non-defect-prone modules is far more than that of defect-prone ones, which results in the bias of classifiers on the majority class samples. In this paper, we propose a novel credibility-based imbalance boosting (CIB) method in order to address the class-imbalance problem in software defect proneness prediction. The method measures the credibility of synthetic samples based on their distribution by introducing a credit factor to every synthetic sample, and proposes a weight updating scheme to make the base classifiers focus on synthetic samples with high credibility and real samples. Experiments are performed on 11 NASA datasets and nine PROMISE datasets by comparing CIB with MAHAKIL, AdaC2, AdaBoost, SMOTE, RUS, No sampling method in terms of four performance measures, i.e., area under the curve (AUC), F1, AGF, and Matthews correlation coefficient (MCC). Wilcoxon sign-ranked test and Cliff’s δ are separately used to perform statistical test and calculate effect size. The experimental results show that CIB is a more promising alternative for addressing the class-imbalance problem in software defect-prone prediction as compared with previous methods.


Imbalanced data classification is a critical and challenging problem in both data mining and machine learning. Imbalanced data classification problems present in many application areas like rare medical diagnosis, risk management, fault-detection, etc. The traditional classification algorithms yield poor results in imbalanced classification problems. In this paper, K-Means cluster based undersampling ensemble algorithm is proposed to solve the imbalanced data classification problem. The proposed method combines K-Means cluster based undersampling and boosting method. The experimental results show that the proposed algorithm outperforms the other sampling ensemble algorithms of previous studies.


2021 ◽  
Vol 147 (3) ◽  
pp. 04020165
Author(s):  
Amin Ariannezhad ◽  
Abolfazl Karimpour ◽  
Xiao Qin ◽  
Yao-Jan Wu ◽  
Yasamin Salmani

2020 ◽  
Vol 21 (2) ◽  
pp. 206-214
Author(s):  
V. S. Tynchenko ◽  
◽  
I. A. Golovenok ◽  
V. E. Petrenko ◽  
A. V. Milov ◽  
...  

Author(s):  
Gunjan Saraogi ◽  
Deepa Gupta ◽  
Lavanya Sharma ◽  
Ajay Rana

Background: Backorders are an accepted abnormality affecting accumulation alternation and logistics, sales, chump service, and manufacturing, which generally leads to low sales and low chump satisfaction. A predictive archetypal can analyse which articles are best acceptable to acquaintance backorders giving the alignment advice and time to adjust, thereby demography accomplishes to aerate their profit. Objective: To address the issue of predicting backorders, this paper has proposed an un-supervised approach to backorder prediction using Deep Autoencoder. Method: In this paper, artificial intelligence paradigms are researched in order to introduce a predictive model for the present unbalanced data issues, where the number of products going on backorder is rare. Result: Un-supervised anomaly detection using deep auto encoders has shown better Area under the Receiver Operating Characteristic and precision-recall curves than supervised classification techniques employed with resampling techniques for imbalanced data problems. Conclusion: We demonstrated that Un-supervised anomaly detection methods specifically deep auto-encoders can be used to learn a good representation of the data. The method can be used as predictive model for inventory management and help to reduce bullwhip effect, raise customer satisfaction as well as improve operational management in the organization. This technology is expected to create the sentient supply chain of the future – able to feel, perceive and react to situations at an extraordinarily granular level


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


2021 ◽  
Vol 13 (2) ◽  
pp. 268
Author(s):  
Xiaochen Lv ◽  
Wenhong Wang ◽  
Hongfu Liu

Hyperspectral unmixing is an important technique for analyzing remote sensing images which aims to obtain a collection of endmembers and their corresponding abundances. In recent years, non-negative matrix factorization (NMF) has received extensive attention due to its good adaptability for mixed data with different degrees. The majority of existing NMF-based unmixing methods are developed by incorporating additional constraints into the standard NMF based on the spectral and spatial information of hyperspectral images. However, they neglect to exploit the nature of imbalanced pixels included in the data, which may cause the pixels mixed with imbalanced endmembers to be ignored, and thus the imbalanced endmembers generally cannot be accurately estimated due to the statistical property of NMF. To exploit the information of imbalanced samples in hyperspectral data during the unmixing procedure, in this paper, a cluster-wise weighted NMF (CW-NMF) method for the unmixing of hyperspectral images with imbalanced data is proposed. Specifically, based on the result of clustering conducted on the hyperspectral image, we construct a weight matrix and introduce it into the model of standard NMF. The proposed weight matrix can provide an appropriate weight value to the reconstruction error between each original pixel and the reconstructed pixel in the unmixing procedure. In this way, the adverse effect of imbalanced samples on the statistical accuracy of NMF is expected to be reduced by assigning larger weight values to the pixels concerning imbalanced endmembers and giving smaller weight values to the pixels mixed by majority endmembers. Besides, we extend the proposed CW-NMF by introducing the sparsity constraints of abundance and graph-based regularization, respectively. The experimental results on both synthetic and real hyperspectral data have been reported, and the effectiveness of our proposed methods has been demonstrated by comparing them with several state-of-the-art methods.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Ho Sun Shon ◽  
Erdenebileg Batbaatar ◽  
Wan-Sup Cho ◽  
Seong Gon Choi
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document