scholarly journals Safe Sample Screening for Robust Support Vector Machine

2020 ◽  
Vol 34 (04) ◽  
pp. 6981-6988
Author(s):  
Zhou Zhai ◽  
Bin Gu ◽  
Xiang Li ◽  
Heng Huang

Robust support vector machine (RSVM) has been shown to perform remarkably well to improve the generalization performance of support vector machine under the noisy environment. Unfortunately, in order to handle the non-convexity induced by ramp loss in RSVM, existing RSVM solvers often adopt the DC programming framework which is computationally inefficient for running multiple outer loops. This hinders the application of RSVM to large-scale problems. Safe sample screening that allows for the exclusion of training samples prior to or early in the training process is an effective method to greatly reduce computational time. However, existing safe sample screening algorithms are limited to convex optimization problems while RSVM is a non-convex problem. To address this challenge, in this paper, we propose two safe sample screening rules for RSVM based on the framework of concave-convex procedure (CCCP). Specifically, we provide screening rule for the inner solver of CCCP and another rule for propagating screened samples between two successive solvers of CCCP. To the best of our knowledge, this is the first work of safe sample screening to a non-convex optimization problem. More importantly, we provide the security guarantee to our sample screening rules to RSVM. Experimental results on a variety of benchmark datasets verify that our safe sample screening rules can significantly reduce the computational time.

Author(s):  
Maryam Yalsavar ◽  
Paknoosh Karimaghaei ◽  
Akbar Sheikh-Akbari ◽  
Pancham Shukla ◽  
Peyman Setoodeh

The application of the support vector machine (SVM) classification algorithm to large-scale datasets is limited due to its use of a large number of support vectors and dependency of its performance on its kernel parameter. In this paper, SVM is redefined as a control system and iterative learning control (ILC) method is used to optimize SVM’s kernel parameter. The ILC technique first defines an error equation and then iteratively updates the kernel function and its regularization parameter using the training error and the previous state of the system. The closed loop structure of the proposed algorithm increases the robustness of the technique to uncertainty and improves its convergence speed. Experimental results were generated using nine standard benchmark datasets covering a wide range of applications. Experimental results show that the proposed method generates superior or very competitive results in term of accuracy than those of classical and state-of-the-art SVM based techniques while using a significantly smaller number of support vectors.


2011 ◽  
Vol 204-210 ◽  
pp. 879-882
Author(s):  
Kai Li ◽  
Xiao Xia Lu

By combining fuzzy support vector machine with rough set, we propose a rough margin based fuzzy support vector machine (RFSVM). It inherits the characteristic of the FSVM method and considers position of training samples of the rough margin in order to reduce overfitting due to noises or outliers. The new proposed algorithm finds the optimal separating hyperplane that maximizes the rough margin containing lower margin and upper margin. Meanwhile, the points lied on the lower margin have larger penalty than these in the boundary of the rough margin. Experiments on several benchmark datasets show that the RFSVM algorithm is effective and feasible compared with the existing support vector machines.


2012 ◽  
Vol 2012 ◽  
pp. 1-16 ◽  
Author(s):  
Meihua Wang ◽  
Fengmin Xu ◽  
Chengxian Xu

The special importance of Difference of Convex (DC) functions programming has been recognized in recent studies on nonconvex optimization problems. In this work, a class of DC programming derived from the portfolio selection problems is studied. The most popular method applied to solve the problem is the Branch-and-Bound (B&B) algorithm. However, “the curse of dimensionality” will affect the performance of the B&B algorithm. DC Algorithm (DCA) is an efficient method to get a local optimal solution. It has been applied to many practical problems, especially for large-scale problems. A B&B-DCA algorithm is proposed by embedding DCA into the B&B algorithms, the new algorithm improves the computational performance and obtains a global optimal solution. Computational results show that the proposed B&B-DCA algorithm has the superiority of the branch number and computational time than general B&B. The nice features of DCA (inexpensiveness, reliability, robustness, globality of computed solutions, etc.) provide crucial support to the combined B&B-DCA for accelerating the convergence of B&B.


2019 ◽  
Vol 28 (07) ◽  
pp. 1950020
Author(s):  
Amine Besrour ◽  
Riadh Ksantini

Support Vector Machine (SVM) is a very competitive linear classifier based on convex optimization problem, were support vectors fully describe decision boundary. Hence, SVM is sensitive to data spread and does not take into account the existence of class subclasses, nor minimizes data dispersion for classification performance improvement. Thus, Kernel subclass SVM (KSSVM) was proposed to handle multimodal data and to minimize data dispersion. Nevertheless, KSSVM has difficulties in classifying sequentially obtained data and handling large scale datasets, since it is based on batch learning. For this reason, we propose a novel incremental KSSVM (iKSSVM) which handles dynamic and large data in a proper manner. The iKSSVM is still based on convex optimization problem and minimizes data dispersion within and between data subclasses incrementally, in order to improve discriminative power and classification performance. An extensive comparative evaluation of the iKSSVM to batch KSSVM, as well as, other contemporary incremental classifiers, on real world datasets, has shown clearly its superiority in terms of classification accuracy.


Author(s):  
Jia-Bin Zhou ◽  
Yan-Qin Bai ◽  
Yan-Ru Guo ◽  
Hai-Xiang Lin

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.


2013 ◽  
Vol 475-476 ◽  
pp. 312-317
Author(s):  
Ping Zhou ◽  
Jin Lei Wang ◽  
Xian Kai Chen ◽  
Guan Jun Zhang

Since dataset usually contain noises, it is very helpful to find out and remove the noise in a preprocessing step. Fuzzy membership can measure a samples weight. The weight should be smaller for noise sample but bigger for important sample. Therefore, appropriate sample memberships are vital. The article proposed a novel approach, Membership Calculate based on Hierarchical Division (MCHD), to calculate the membership of training samples. MCHD uses the conception of dimension similarity, which develop a bottom-up clustering technique to calculate the sample membership iteratively. The experiment indicates that MCHD can effectively detect noise and removes them from the dataset. Fuzzy support vector machine based on MCHD outperforms most of approaches published recently and hold the better generalization ability to handle the noise.


2021 ◽  
Author(s):  
Mohammad Hassan Almaspoor ◽  
Ali Safaei ◽  
Afshin Salajegheh ◽  
Behrouz Minaei-Bidgoli

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.


2020 ◽  
Vol 16 (10) ◽  
pp. 155014772096383
Author(s):  
Yan Qiao ◽  
Xinhong Cui ◽  
Peng Jin ◽  
Wu Zhang

This article addresses the problem of outlier detection for wireless sensor networks. As increasing amounts of observational data are tending to be high-dimensional and large scale, it is becoming increasingly difficult for existing techniques to perform outlier detection accurately and efficiently. Although dimensionality reduction tools (such as deep belief network) have been utilized to compress the high-dimensional data to support outlier detection, these methods may not achieve the desired performance due to the special distribution of the compressed data. Furthermore, because most existed classification methods must solve a quadratic optimization problem in their training stage, they cannot perform well in large-scale datasets. In this article, we developed a new form of classification model called “deep belief network online quarter-sphere support vector machine,” which combines deep belief network with online quarter-sphere one-class support vector machine. Based on this model, we first propose a model training method that learns the radius of the quarter sphere by a sorting method. Then, an online testing method is proposed to perform online outlier detection without supervision. Finally, we compare the proposed method with the state of the arts using extensive experiments. The experimental results show that our method not only reduces the computational cost by three orders of magnitude but also improves the detection accuracy by 3%–5%.


Sign in / Sign up

Export Citation Format

Share Document