class imbalance learning Latest Research Papers

Class Imbalance Learning to Heterogeneous Cross Software Projects Defect Prediction

International Journal of Software Innovation ◽

10.4018/ijsi.292021 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Research Work ◽

Class Imbalance ◽

Training Dataset ◽

Software Projects ◽

Class Imbalance Problem ◽

Software Application ◽

Imbalance Problem ◽

Under Sampling ◽

Imbalance Learning ◽

Class Imbalance Learning

Heterogeneous CPDP (HCPDP) attempts to forecast defects in a software application having insufficient previous defect data. Nonetheless, with a Class Imbalance Problem (CIP) perspective, one should have a clear view of data distribution in the training dataset otherwise the trained model would lead to biased classification results. Class Imbalance Learning (CIL) is the method of achieving an equilibrium ratio between two classes in imbalanced datasets. There are a range of effective solutions to manage CIP such as resampling techniques like Over-Sampling (OS) & Under-Sampling (US) methods. The proposed research work employs Synthetic Minority Oversampling TEchnique (SMOTE) and Random Under Sampling (RUS) technique to handle CIP. In addition to this, the paper proposes a novel four-phase HCPDP model and contrasts the efficiency of basic HCPDP model with CIP and after handling CIP using SMOTE & RUS with three prediction pairs. Results show that training performance with SMOTE is substantially improved but RUS displays variations in relation to HCPDP for all three prediction pairs.

Download Full-text

A Method for Class-Imbalance Learning in Android Malware Detection

Electronics ◽

10.3390/electronics10243124 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3124

Author(s):

Jun Guan ◽

Xu Jiang ◽

Baolei Mao

Keyword(s):

Machine Learning ◽

Malware Detection ◽

Computational Cost ◽

Class Imbalance ◽

Sampling Technique ◽

Minority Class ◽

Android Malware ◽

Android Malware Detection ◽

Imbalance Learning ◽

Class Imbalance Learning

More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance in Android malware detection based on machine learning. Hence, the researchers have focused on how to solve class-imbalance to improve the performance of Android malware detection. However, the disadvantages of the existing class-imbalance learning are mainly the loss of valuable samples and the computational cost. In this paper, we propose a method of Class-Imbalance Learning (CIL), which first selects representative features, uses the clustering K-Means algorithm and under-sampling to retain the important samples of the majority class while reducing the number of samples of the majority class. After that, we use the Synthetic Minority Over-Sampling Technique (SMOTE) algorithm to generate minority class samples for data balance, and finally use the Random Forest (RF) algorithm to build a malware detection model. The result of experiments indicates that CIL effectively improves the performance of Android malware detection based on machine learning, especially for class imbalance. Compared with existing class-imbalance learning methods, CIL is also effective for the Machine Learning Repository from the University of California, Irvine (UCI) and has better performance in some data sets.

Download Full-text

Development of Predictive Models for “Very Poor” Beach Water Quality Gradings Using Class-Imbalance Learning

Environmental Science & Technology ◽

10.1021/acs.est.1c03350 ◽

2021 ◽

Author(s):

Jiuhao Guo ◽

Joseph H. W. Lee

Keyword(s):

Water Quality ◽

Predictive Models ◽

Class Imbalance ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Beach Water Quality ◽

Beach Water

Download Full-text

M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115999 ◽

2021 ◽

pp. 115999

Author(s):

Qingyong Wang ◽

Yun Zhou ◽

Zehong Cao ◽

Weiming Zhang

Keyword(s):

Class Imbalance ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00786-0 ◽

2021 ◽

Author(s):

Alessio Bernardo ◽

Emanuele Della Valle

Keyword(s):

Data Streams ◽

Concept Drift ◽

Class Imbalance ◽

Imbalanced Data ◽

Real Data ◽

Minority Class ◽

Machine Learning Classification ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

Better Than

AbstractThe world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.

Download Full-text

Probability Density Machine: A New Solution of Class Imbalance Learning

Scientific Programming ◽

10.1155/2021/7555587 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Ruihan Cheng ◽

Longfei Zhang ◽

Shiqi Wu ◽

Sen Xu ◽

Shang Gao ◽

...

Keyword(s):

Probability Density ◽

Predictive Model ◽

Data Distribution ◽

Class Imbalance ◽

Imbalanced Data ◽

Training Data ◽

Probability Density Estimation ◽

Imbalance Learning ◽

Class Imbalance Learning ◽

The Impact

Class imbalance learning (CIL) is an important branch of machine learning as, in general, it is difficult for classification models to learn from imbalanced data; meanwhile, skewed data distribution frequently exists in various real-world applications. In this paper, we introduce a novel solution of CIL called Probability Density Machine (PDM). First, in the context of Gaussian Naive Bayes (GNB) predictive model, we analyze the reason why imbalanced data distribution makes the performance of predictive model decline in theory and draw a conclusion regarding the impact of class imbalance that is only associated with the prior probability, but does not relate to the conditional probability of training data. Then, in such context, we show the rationality of several traditional CIL techniques. Furthermore, we indicate the drawback of combining GNB with these traditional CIL techniques. Next, profiting from the idea of K-nearest neighbors probability density estimation (KNN-PDE), we propose the PDM which is an improved GNB-based CIL algorithm. Finally, we conduct experiments on lots of class imbalance data sets, and the proposed PDM algorithm shows the promising results.

Download Full-text

Fuzzy least squares projection twin support vector machines for class imbalance learning

Applied Soft Computing ◽

10.1016/j.asoc.2021.107933 ◽

2021 ◽

pp. 107933

Author(s):

M.A. Ganaie ◽

M. Tanveer

Keyword(s):

Support Vector Machines ◽

Least Squares ◽

Class Imbalance ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine

Sensors ◽

10.3390/s21175730 ◽

2021 ◽

Vol 21 (17) ◽

pp. 5730

Author(s):

Yadong Cai ◽

Shiqi Wu ◽

Ming Zhou ◽

Shang Gao ◽

Hualong Yu

Keyword(s):

Probability Density ◽

Coal Mine ◽

Early Warning ◽

Learning Algorithms ◽

Class Imbalance ◽

Gas Explosion ◽

Concentration Data ◽

Gas Concentration ◽

Imbalance Learning ◽

Class Imbalance Learning

Gas explosion has always been an important factor restricting coal mine production safety. The application of machine learning techniques in coal mine gas concentration prediction and early warning can effectively prevent gas explosion accidents. Nearly all traditional prediction models use a regression technique to predict gas concentration. Considering there exist very few instances of high gas concentration, the instance distribution of gas concentration would be extremely imbalanced. Therefore, such regression models generally perform poorly in predicting high gas concentration instances. In this study, we consider early warning of gas concentration as a binary-class problem, and divide gas concentration data into warning class and non-warning class according to the concentration threshold. We proposed the probability density machine (PDM) algorithm with excellent adaptability to imbalanced data distribution. In this study, we use the original gas concentration data collected from several monitoring points in a coal mine in Datong city, Shanxi Province, China, to train the PDM model and to compare the model with several class imbalance learning algorithms. The results show that the PDM algorithm is superior to the traditional and state-of-the-art class imbalance learning algorithms, and can produce more accurate early warning results for gas explosion.

Download Full-text

Towards graph-based class-imbalance learning for hospital readmission

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114791 ◽

2021 ◽

Vol 176 ◽

pp. 114791

Author(s):

Guodong Du ◽

Jia Zhang ◽

Fenglong Ma ◽

Min Zhao ◽

Yaojin Lin ◽

...

Keyword(s):

Hospital Readmission ◽

Class Imbalance ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

Class Imbalance Learning Using Fuzzy ART and Intuitionistic Fuzzy Twin Support Vector Machines

Information Sciences ◽

10.1016/j.ins.2021.07.010 ◽

2021 ◽

Author(s):

Salim Rezvani ◽

Xizhao Wang

Keyword(s):

Support Vector Machines ◽

Class Imbalance ◽

Support Vector ◽

Twin Support Vector Machines ◽

Fuzzy Art ◽

Intuitionistic Fuzzy ◽

Vector Machines ◽

Imbalance Learning ◽

Class Imbalance Learning

Download Full-text

class imbalance learning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Class Imbalance Learning to Heterogeneous Cross Software Projects Defect Prediction

A Method for Class-Imbalance Learning in Android Malware Detection

Development of Predictive Models for “Very Poor” Beach Water Quality Gradings Using Class-Imbalance Learning

M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning

VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

Probability Density Machine: A New Solution of Class Imbalance Learning

Fuzzy least squares projection twin support vector machines for class imbalance learning

Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine

Towards graph-based class-imbalance learning for hospital readmission

Class Imbalance Learning Using Fuzzy ART and Intuitionistic Fuzzy Twin Support Vector Machines

Export Citation Format

class imbalance learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Class Imbalance Learning to Heterogeneous Cross Software Projects Defect Prediction

A Method for Class-Imbalance Learning in Android Malware Detection

Development of Predictive Models for “Very Poor” Beach Water Quality Gradings Using Class-Imbalance Learning

M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning

VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

Probability Density Machine: A New Solution of Class Imbalance Learning

Fuzzy least squares projection twin support vector machines for class imbalance learning

Early Warning of Gas Concentration in Coal Mines Production Based on Probability Density Machine

Towards graph-based class-imbalance learning for hospital readmission

Class Imbalance Learning Using Fuzzy ART and Intuitionistic Fuzzy Twin Support Vector Machines

class imbalance learning
Recently Published Documents