Fuzzy Asymmetric Support Vector Machines

2012 ◽  
Vol 433-440 ◽  
pp. 7479-7486
Author(s):  
Rui Kong ◽  
Qiong Wang ◽  
Gu Yu Hu ◽  
Zhi Song Pan

Support Vector Machines (SVM) has been extensively studied and has shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in medical diagnosis and detecting credit card fraud). In this paper, we propose the fuzzy asymmetric algorithm to augment SVMs to deal with imbalanced training-data problems, called FASVM, which is based on fuzzy memberships, combined with different error costs (DEC) algorithm. We compare the performance of our algorithm against these two algorithms, along with different error costs and regular SVM and show that our algorithm outperforms all of them.

Author(s):  
M. Ustuner ◽  
F. B. Sanli ◽  
S. Abdikan

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.


2021 ◽  
Vol 25 (1) ◽  
pp. 105-119 ◽  
Author(s):  
Chenglong Li ◽  
Ning Ding ◽  
Yiming Zhai ◽  
Haoyun Dong

Credit card fraud is the new financial fraud crime accompanied by the gradual development of the economy which causes billions of dollars of losses every year. Credit card fraud case not only seriously violated the cardholder benefits and financial institutions, but also undermined the credit management order. However, fraudsters keep exploring new crime strategies constantly which exacerbates the crime rate of fraud. Thus, a predictive model for credit card fraud detection is essential to minimize its losses. By distinguishing between fraud and non-fraud, machine learning is one of the most efficient solutions for detecting fraud. Support vector machines have proven to be a novel algorithm with excellent performance. Nevertheless, the performance of SVM depends largely on the correct choice of model parameters (C and g), which could cause that the false positive was very high if the kernel function type and parameter cannot be selected properly. In this paper, based on the real transaction data of the credit card business, firstly, it will find the optimal kernel function suitable for the data set. Secondly, this paper will propose the method of optimizing the support vector machine parameters by the cuckoo search algorithm, genetic algorithm and particle swarm optimization algorithm. Last but not least, the Linear kernel function was found to be the best kernel function with an accuracy rate of 91.56%. Furthermore, the Radial basis function is used to optimize the kernel function, which can improve the accuracy from 42.86% to the highest accuracy rate of 98.05%. Compared with CS-SVM and GA-SVM, PSO-SVM has the best overall performance.


Author(s):  
M. Ustuner ◽  
F. B. Sanli ◽  
S. Abdikan

The accuracy of supervised image classification is highly dependent upon several factors such as the design of training set (sample selection, composition, purity and size), resolution of input imagery and landscape heterogeneity. The design of training set is still a challenging issue since the sensitivity of classifier algorithm at learning stage is different for the same dataset. In this paper, the classification of RapidEye imagery with balanced and imbalanced training data for mapping the crop types was addressed. Classification with imbalanced training data may result in low accuracy in some scenarios. Support Vector Machines (SVM), Maximum Likelihood (ML) and Artificial Neural Network (ANN) classifications were implemented here to classify the data. For evaluating the influence of the balanced and imbalanced training data on image classification algorithms, three different training datasets were created. Two different balanced datasets which have 70 and 100 pixels for each class of interest and one imbalanced dataset in which each class has different number of pixels were used in classification stage. Results demonstrate that ML and NN classifications are affected by imbalanced training data in resulting a reduction in accuracy (from 90.94% to 85.94% for ML and from 91.56% to 88.44% for NN) while SVM is not affected significantly (from 94.38% to 94.69%) and slightly improved. Our results highlighted that SVM is proven to be a very robust, consistent and effective classifier as it can perform very well under balanced and imbalanced training data situations. Furthermore, the training stage should be precisely and carefully designed for the need of adopted classifier.


Author(s):  
Ribana Roscher ◽  
Jan Behmann ◽  
Anne-Katrin Mahlein ◽  
Jan Dupuis ◽  
Heiner Kuhlmann ◽  
...  

We analyze the benefit of combining hyperspectral images information with 3D geometry information for the detection of <i>Cercospora</i> leaf spot disease symptoms on sugar beet plants. Besides commonly used one-class Support Vector Machines, we utilize an unsupervised sparse representation-based approach with group sparsity prior. Geometry information is incorporated by representing each sample of interest with an inclination-sorted dictionary, which can be seen as an 1D topographic dictionary. We compare this approach with a sparse representation based approach without geometry information and One-Class Support Vector Machines. One-Class Support Vector Machines are applied to hyperspectral data without geometry information as well as to hyperspectral images with additional pixelwise inclination information. Our results show a gain in accuracy when using geometry information beside spectral information regardless of the used approach. However, both methods have different demands on the data when applied to new test data sets. One-Class Support Vector Machines require full inclination information on test and training data whereas the topographic dictionary approach only need spectral information for reconstruction of test data once the dictionary is build by spectra with inclination.


Sign in / Sign up

Export Citation Format

Share Document