Support Vector Machine Classifier with WHM Offset for Unbalanced Data

Author(s):  
Boyang Li ◽  
◽  
Jinglu Hu ◽  
Kotaro Hirasawa

We propose an improved support vector machine (SVM) classifier by introducing a new offset, for solving the real-world unbalanced classification problem. The new offset is calculated based on the unbalanced support vectors resulting from the unbalanced training data. We developed a weighted harmonic mean (WHM) algorithm to further reduce the effects of noise on offset calculation. We apply the proposed approach to classify real-world data. Results of simulation demonstrate the effectiveness of our proposed approach.

Author(s):  
Wenjuan An ◽  
Mangui Liang ◽  
He Liu

Outlier detection, as a type of one-class classification problem, is one of important research topics in data mining and machine learning. Its task is to identify sample points markedly deviating from the normal data. A reliable outlier detector needs to build a model which encloses the normal data tightly. In this paper, an improved one-class SVM (OC-SVM) classifier is proposed for outlier detection problems. We name this method OC-SVM with minimum within-class scatter (OC-WCSSVM), which exploits the inner-class structure of the training set via minimizing the within-class scatter of the training data. This can construct a more accurate hyperplane for outlier detection, such that the margin between the training data and the origin in a higher dimensional space is as large as possible, while at the same time the decision boundary around the normal data is as tight as possible. Experimental results on a synthetic dataset and 10 real-world datasets demonstrate that our proposed OC-WCSSVM algorithm is effective and superior to the compared algorithms.


2014 ◽  
Vol 2014 ◽  
pp. 1-10
Author(s):  
Kun Zhang ◽  
Minrui Fei ◽  
Xin Li ◽  
Huiyu Zhou

Features analysis is an important task which can significantly affect the performance of automatic bacteria colony picking. Unstructured environments also affect the automatic colony screening. This paper presents a novel approach for adaptive colony segmentation in unstructured environments by treating the detected peaks of intensity histograms as a morphological feature of images. In order to avoid disturbing peaks, an entropy based mean shift filter is introduced to smooth images as a preprocessing step. The relevance and importance of these features can be determined in an improved support vector machine classifier using unascertained least square estimation. Experimental results show that the proposed unascertained least square support vector machine (ULSSVM) has better recognition accuracy than the other state-of-the-art techniques, and its training process takes less time than most of the traditional approaches presented in this paper.


Author(s):  
PETER MC LEOD ◽  
BRIJESH VERMA

This paper presents a novel technique for the classification of suspicious areas in digital mammograms. The proposed technique is based on clustering of input data into numerous clusters and amalgamating them with a Support Vector Machine (SVM) classifier. The technique is called multi-cluster support vector machine (MCSVM) and is designed to provide a fast converging technique with good generalization abilities leading to an improved classification as a benign or malignant class. The proposed MCSVM technique has been evaluated on data from the Digital Database of Screening Mammography (DDSM) benchmark database. The experimental results showed that the proposed MCSVM classifier achieves better results than standard SVM. A paired t-test and Anova analysis showed that the results are statistically significant.


Author(s):  
Chao-Yung Hsu ◽  
Li-Wei Kang ◽  
Ming-Fang Weng

Visible surface defects are common in steel products, such as crack or scratch defects on steel slabs (a main product of the upstream production line in a steel production line). In order to prevent propagation of defects from the upstream to the downstream production lines, it is important to predict or detect the defects in earlier stage of a steel production line, especially for the defects on steel slabs. In this paper, we address the problem regarding the prediction of surface defects on continuous casting steel slabs. The main goal of this paper is to accurately predict the occurrence of surface defects on steel slabs based on the online collected data from the production line. Accurate prediction of surface defects would be helpful for online adjusting the process and environmental factors to promote producibility and reduce the occurrence of defects, which should be more useful than only inspection of defects. The major challenge here is that the amounts of samples for normal cases and defects are usually unbalanced, where the number of defective samples is usually much fewer than that of normal cases. To cope with the problem, we formulate the problem as a one class classification problem, where only normal training data are used. To solve the problem, we propose to learn a one-class SVM (support vector machine) classifier based on online collected process data and environmental factors for only normal cases to predict the occurrence of defects for steel slabs. Our experimental results have demonstrated that the learned one class SVM (OCSVM) classifier performs better prediction accuracy than the traditional two-class SVM classifier (relying on both positive and negative training samples) used for comparisons.


2010 ◽  
Vol 19 (05) ◽  
pp. 647-677 ◽  
Author(s):  
LAURA DIOŞAN ◽  
ALEXANDRINA ROGOZAN ◽  
JEAN-PIERRE PECUCHET

Classic kernel-based classifiers use only a single kernel, but the real-world applications have emphasized the need to consider a combination of kernels — also known as a multiple kernel (MK) — in order to boost the classification accuracy by adapting better to the characteristics of the data. Our purpose is to automatically design a complex multiple kernel by evolutionary means. In order to achieve this purpose we propose a hybrid model that combines a Genetic Programming (GP) algorithm and a kernel-based Support Vector Machine (SVM) classifier. In our model, each GP chromosome is a tree that encodes the mathematical expression of a multiple kernel. The evolutionary search process of the optimal MK is guided by the fitness function (or efficiency) of each possible MK. The complex multiple kernels which are evolved in this manner (eCMKs) are compared to several classic simple kernels (SKs), to a convex linear multiple kernel (cLMK) and to an evolutionary linear multiple kernel (eLMK) on several real-world data sets from UCI repository. The numerical experiments show that the SVM involving the evolutionary complex multiple kernels perform better than the classic simple kernels. Moreover, on the considered data sets, the new multiple kernels outperform both the cLMK and eLMK — linear multiple kernels. These results emphasize the fact that the SVM algorithm requires a combination of kernels more complex than a linear one in order to boost its performance.


2014 ◽  
Vol 609-610 ◽  
pp. 1448-1452
Author(s):  
Kun Zhang ◽  
Min Rui Fei

Features analysis is an important task which can significantly affect the performance of automatic bacteria colony picking. This paper presents a novel approach for adaptive colony segmentation by classifying the detected peaks of intensity histograms of images. The relevance and importance of these features can be determined in an improved support vector machine classifier using unascertained least square estimation. Experimental results show that the proposed unascertained support vector machine (USVM) has better recognition accuracy than the other state of the art techniques, and its training process takes less time than most of the traditional approaches presented in this paper.


2020 ◽  
Vol 8 (5) ◽  
pp. 1557-1560

Support vector machine (SVM) is a commonly known efficient supervised learning algorithm for classification problems. However, the classification accuracy of the SVM classifier depends on its training parameters and the training data set as well. The main objective of this paper is to optimize its parameters and feature weighting in order to improve the strength of the SVM simultaneously. In this paper, the Imperialist Competitive Algorithm based Support Vector Machine (ICA-SVM) classifier is proposed to classify the efficient weed detection. This enhanced ICA-SVM classifier is able to select the appropriate input features and to optimize the parameters of SVM and is improving the classification accuracy. Experimental results show that the ICA-SVM classification algorithm reduces the computational complexity tremendously and improves classification Accuracy.


2016 ◽  
Vol 2 (1) ◽  
pp. 1-22
Author(s):  
P. Then ◽  
Y.C. Wang

Digital watermark detection is treated as classification problem of image processing. For image classification that searches for a butterfly, an image can be classified as positive class that is a butterfly and negative class that is not a butterfly. Similarly, the watermarked and unwatermarked images are perceived as positive and negative class respectively. Hence, Support Vector Machine (SVM) is used as the classifier of watermarked and unwatermarked digital image due to its ability of separating both linearly and non-linearly separable data. Hyperplanes of various detectors are briefly elaborated to show how SVM's hyperplane is suitable for Stirmark attacked watermarked image. Cox’s spread spectrum watermarking scheme is used to embed the watermark into digital images. Then, Support Vector Machine is trained with both the watermarked and unwatermarked images. Training SVM eliminates the use of watermark during the detection process. Receiver Operating Characteristics (ROC) graphs are plotted to assess the false positive and false negative probability of both the correlation detector of the watermarking schemes and SVM classifier. Both watermarked and unwatermarked images are later attacked under Stirmark, and then tested on the correlation detector and SVM classifier. Remedies are suggested to preprocess the training data. The optimal setting of SVM parameters is also investigated and determined besides preprocessing. The preprocessing and optimal parameters setting enable the trained SVM to achieve substantially better results than those resulting from the correlation detector.


Sign in / Sign up

Export Citation Format

Share Document