Imbalanced Data Classification Using Cost-Sensitive Support Vector Machine Based on Information Entropy

2014 ◽  
Vol 989-994 ◽  
pp. 1756-1761 ◽  
Author(s):  
Wei Duan ◽  
Liang Jing ◽  
Xiang Yang Lu

As a supervised classification algorithm, Support Vector Machine (SVM) has an excellent ability in solving small samples, nonlinear and high dimensional classification problems. However, SVM is inefficient for imbalanced data sets classification. Therefore, a cost sensitive SVM (CSSVM) should be designed for imbalanced data sets classification. This paper proposes a method which constructed CSSVM based on information entropy, and in this method the information entropies of different classes of data set are used to determine the values of penalty factor of CSSVM.

2020 ◽  
Vol 122 ◽  
pp. 289-307 ◽  
Author(s):  
Xinmin Tao ◽  
Qing Li ◽  
Chao Ren ◽  
Wenjie Guo ◽  
Qing He ◽  
...  

2011 ◽  
Vol 219-220 ◽  
pp. 151-155 ◽  
Author(s):  
Hua Ji ◽  
Hua Xiang Zhang

In many real-world domains, learning from imbalanced data sets is always confronted. Since the skewed class distribution brings the challenge for traditional classifiers because of much lower classification accuracy on rare classes, we propose the novel method on classification with local clustering based on the data distribution of the imbalanced data sets to solve this problem. At first, we divide the whole data set into several data groups based on the data distribution. Then we perform local clustering within each group both on the normal class and the disjointed rare class. For rare class, the subsequent over-sampling is employed according to the different rates. At last, we apply support vector machines (SVMS) for classification, by means of the traditional tactic of the cost matrix to enhance the classification accuracies. The experimental results on several UCI data sets show that this method can produces much higher prediction accuracies on the rare class than state-of-art methods.


2021 ◽  
Vol 11 (11) ◽  
pp. 4970
Author(s):  
Łukasz Rybak ◽  
Janusz Dudczyk

The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their geometrical divide. In the previous analyses it was established that the Geometrical Divide (GD) method outperforms the algorithm creating the data particles based on classes by a compound of 1 ÷ 1 cardinality. This occurs in the process of balanced data sets classification, in which class centroids are close to each other and the groups of objects, described by different labels, overlap. The purpose of the article was to examine the efficiency of the Geometrical Divide method in the unbalanced data sets classification, by the example of real case-occupancy detecting. In addition, in the paper, the concept of the Unequal Geometrical Divide (UGD) was developed. The evaluation of approaches was conducted on 26 unbalanced data sets-16 with the features of Moons and Circles data sets and 10 created based on real occupancy data set. In the experiment, the GD method and its unbalanced variant (UGD) as well as the 1CT1P approach, were compared. Each method was combined with three data particle mass determination algorithms-n-Mass Model (n-MM), Stochastic Learning Algorithm (SLA) and Bath-update Algorithm (BLA). k-fold cross validation method, precision, recall, F-measure, and number of used data particles were applied in the evaluation process. Obtained results showed that the methods based on geometrical divide outperform the 1CT1P approach in the imbalanced data sets classification. The article’s conclusion describes the observations and indicates the potential directions of further research and development of methods, which concern creating the data particle through its geometrical divide.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Peng Li ◽  
Tian-ge Liang ◽  
Kai-hui Zhang

This paper creatively proposes a cluster boundary sampling method based on density clustering to solve the problem of resampling in IDS classification and verify its effectiveness experimentally. We use the clustering density threshold and the boundary density threshold to determine the cluster boundaries, in order to guide the process of resampling more scientifically and accurately. Then, we adopt the penalty factor to regulate the data imbalance effect on SVM classification algorithm. The achievements and scientific significance of this paper do not propose the best classifier or solution of imbalanced data set and just verify the validity and stability of proposed IDS resampling method. Experiments show that our method acquires obvious promotion effect in various imbalanced data sets.


2020 ◽  
Vol 26 (4) ◽  
pp. 380-395
Author(s):  
Peisong Gong ◽  
Haixiang Guo ◽  
Yuanyue Huang ◽  
Shengyu Guo

Safety risk evaluations of deep foundation construction schemes are important to ensure safety. However, the amount of knowledge on these evaluations is large, and the historical data of deep foundation engineering is imbalanced. Some adverse factors influence the quality and efficiency of evaluations using traditional manual evaluation tools. Machine learning guarantees the quality of imbalanced data classifications. In this study, three strategies are proposed to improve the classification accuracy of imbalanced data sets. First, data set information redundancy is reduced using a binary particle swarm optimization algorithm. Then, a classification algorithm is modified using an Adaboost-enhanced support vector machine classifier. Finally, a new classification evaluation standard, namely, the area under the ROC curve, is adopted to ensure the classifier to be impartial to the minority. A transverse comparison experiment using multiple classification algorithms shows that the proposed integrated classification algorithm can overcome difficulties associated with correctly classifying minority samples in imbalanced data sets. The algorithm can also improve construction safety management evaluations, relieve the pressure from the lack of experienced experts accompanying rapid infrastructure construction, and facilitate knowledge reuse in the field of architecture, engineering, and construction.


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


Sign in / Sign up

Export Citation Format

Share Document