DPWSS: differentially private working set selection for training support vector machines

Support vector machine (SVM) is a robust machine learning method and is widely used in classification. However, the traditional SVM training methods may reveal personal privacy when the training data contains sensitive information. In the training process of SVMs, working set selection is a vital step for the sequential minimal optimization-type decomposition methods. To avoid complex sensitivity analysis and the influence of high-dimensional data on the noise of the existing SVM classifiers with privacy protection, we propose a new differentially private working set selection algorithm (DPWSS) in this paper, which utilizes the exponential mechanism to privately select working sets. We theoretically prove that the proposed algorithm satisfies differential privacy. The extended experiments show that the DPWSS algorithm achieves classification capability almost the same as the original non-privacy SVM under different parameters. The errors of optimized objective value between the two algorithms are nearly less than two, meanwhile, the DPWSS algorithm has a higher execution efficiency than the original non-privacy SVM by comparing iterations on different datasets. To the best of our knowledge, DPWSS is the first private working set selection algorithm based on differential privacy.

Download Full-text

Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v4i1.612 ◽

2012 ◽

Vol 4 (1) ◽

Cited By ~ 35

Author(s):

Benjamin I. P. Rubinstein ◽

Peter L. Bartlett ◽

Ling Huang ◽

Nina Taft

Keyword(s):

High Probability ◽

Large Scale ◽

Differential Privacy ◽

Feature Space ◽

Privacy Preserving ◽

Training Data ◽

Inner Product ◽

Support Vector ◽

Sensitive Information ◽

Finite Dimensional

The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.

Download Full-text

Augmented Lagrangian – fast projected gradient algorithm with working set selection for training support vector machines

Journal of Applied and Numerical Optimization ◽

10.23952/jano.3.2021.1.02 ◽

2021 ◽

Vol 3 (1) ◽

Keyword(s):

Support Vector Machines ◽

Augmented Lagrangian ◽

Gradient Algorithm ◽

Support Vector ◽

Projected Gradient ◽

Training Support ◽

Vector Machines ◽

Selection For ◽

Working Set Selection ◽

Working Set

Download Full-text

Second-Order SMO Improves SVM Online and Active Learning

Neural Computation ◽

10.1162/neco.2007.10-06-354 ◽

2008 ◽

Vol 20 (2) ◽

pp. 374-382 ◽

Cited By ~ 16

Author(s):

Tobias Glasmachers ◽

Christian Igel

Keyword(s):

Active Learning ◽

Large Data ◽

Single Step ◽

Second Order ◽

Selection Strategy ◽

Support Vector ◽

Data Sets ◽

Working Set Selection ◽

Working Set ◽

Efficient Approximation

Iterative learning algorithms that approximate the solution of support vector machines (SVMs) have two potential advantages. First, they allow online and active learning. Second, for large data sets, computing the exact SVM solution may be too time-consuming, and an efficient approximation can be preferable. The powerful LASVM iteratively approaches the exact SVM solution using sequential minimal optimization (SMO). It allows efficient online and active learning. Here, this algorithm is considerably improved in speed and accuracy by replacing the working set selection in the SMO steps. A second-order working set selection strategy, which greedily aims at maximizing the progress in each single step, is incorporated.

Download Full-text

A TrAdaBoost Method for Detecting Multiple Subjects’ N200 and P300 Potentials Based on Cross-Validation and an Adaptive Threshold

International Journal of Neural Systems ◽

10.1142/s0129065720500094 ◽

2020 ◽

Vol 30 (03) ◽

pp. 2050009 ◽

Cited By ~ 1

Author(s):

Mengfan Li ◽

Fang Lin ◽

Guizhi Xu

Keyword(s):

Information Transfer ◽

Cross Validation ◽

Area Under The Curve ◽

Adaptive Threshold ◽

Training Data ◽

Support Vector ◽

Training Methods ◽

Information Transfer Rate ◽

Traditional Training ◽

Subject Specific

Traditional training methods need to collect a large amount of data for every subject to train a subject-specific classifier, which causes subjects fatigue and training burden. This study proposes a novel training method, TrAdaBoost based on cross-validation and an adaptive threshold (CV-T-TAB), to reduce the amount of data required for training by selecting and combining multiple subjects’ classifiers that perform well on a new subject to train a classifier. This method adopts cross-validation to extend the amount of the new subject’s training data and sets an adaptive threshold to select the optimal combination of the classifiers. Twenty-five subjects participated in the N200- and P300-based brain–computer interface. The study compares CV-T-TAB to five traditional training methods by testing them on the training of a support vector machine. The accuracy, information transfer rate, area under the curve, recall and precision are used to evaluate the performances under nine conditions with different amounts of data. CV-T-TAB outperforms the other methods and retains a high accuracy even when the amount of data is reduced to one-third of the original amount. The results imply that CV-T-TAB is effective in improving the performance of a subject-specific classifier with a small amount of data by adopting multiple subjects’ classifiers, which reduces the training cost.

Download Full-text

A Defense Framework for Privacy Risks in Remote Machine Learning Service

Security and Communication Networks ◽

10.1155/2021/9924684 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yang Bai ◽

Yu Li ◽

Mingchuang Xie ◽

Mingyu Fan

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Training Data ◽

Sensitive Information ◽

Learning Approaches ◽

Local Data ◽

Sensitive Data ◽

Privacy Risks

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

Download Full-text

Two-stage incremental working set selection for fast support vector training on large datasets

2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies ◽

10.1109/rivf.2008.4586359 ◽

2008 ◽

Cited By ~ 2

Author(s):

DucDung Nguyen ◽

Kazunori Matsumoto ◽

Yasuhiro Takishima ◽

Kazuo Hashimoto ◽

Masahiro Terabe

Keyword(s):

Large Datasets ◽

Support Vector ◽

Two Stage ◽

Selection For ◽

Working Set Selection ◽

Working Set

Download Full-text

Method of Parallel Sequential Minimal Optimization for Fast Training Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.947 ◽

2010 ◽

Vol 29-32 ◽

pp. 947-951

Author(s):

Li Yan Tian ◽

Xiao Guang Hu

Keyword(s):

Support Vector Machine ◽

Computation Time ◽

Training Data ◽

High Dimensional ◽

Support Vector ◽

Sequential Minimal Optimization ◽

Data Set ◽

Training Support ◽

Fast Training ◽

Partitioned Data

A fast training support vector machine using parallel sequential minimal optimization is presented in this paper. Up to now, sequential minimal optimization (SMO) is one of the major algorithms for training SVM, but it still requires a large amount of computation time for the large sample problems. Unlike the traditional SMO, the parallel SMO partitions the entire training data set into small subsets first and then runs multiple CPU processors to seal with each of the partitioned data set. Experiments show that the new algorithm has great advantage in terms of speediness when applied to problems with large training sets and high dimensional spaces without reducing generalization performance of SVM.

Download Full-text

On the working set selection in gradient projection-based decomposition techniques for support vector machines

Optimization Methods and Software ◽

10.1080/10556780500140714 ◽

2005 ◽

Vol 20 (4-5) ◽

pp. 583-596 ◽

Cited By ~ 15

Author(s):

Thomas Serafini ◽

Luca Zanni

Keyword(s):

Support Vector Machines ◽

Gradient Projection ◽

Support Vector ◽

Decomposition Techniques ◽

Vector Machines ◽

Working Set Selection ◽

Working Set

Download Full-text

Federated Learning Model with Adaptive Differential Privacy Protection in Medical IoT

Wireless Communications and Mobile Computing ◽

10.1155/2021/8967819 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Lina Ni ◽

Peng Huang ◽

Yongshan Wei ◽

Minglei Shu ◽

Jinquan Zhang

Keyword(s):

Artificial Intelligence ◽

Internet Of Things ◽

Data Privacy ◽

Medical Information ◽

Differential Privacy ◽

Training Data ◽

Security And Privacy ◽

Good Effect ◽

Model Parameters ◽

Personal Privacy

With the proliferation of intelligent services and applications authorized by artificial intelligence, the Internet of Things has penetrated into many aspects of our daily lives, and the medical field is no exception. The medical Internet of Things (MIoT) can be applied to wearable devices, remote diagnosis, mobile medical treatment, and remote monitoring. There is a large amount of medical information in the databases of various medical institutions. Nevertheless, due to the particularity of medical data, it is extremely related to personal privacy, and the data cannot be shared, resulting in data islands. Federated learning (FL), as a distributed collaborative artificial intelligence method, provides a solution. However, FL also involves multiple security and privacy issues. This paper proposes an adaptive Differential Privacy Federated Learning Medical IoT (DPFL-MIoT) model. Specifically, when the user updates the model locally, we propose a differential privacy federated learning deep neural network with adaptive gradient descent (DPFLAGD-DNN) algorithm, which can adaptively add noise to the model parameters according to the characteristics and gradient of the training data. Since privacy leaks often occur in downlink, we present differential privacy federated learning (DP-FL) algorithm where adaptive noise is added to the parameters when the server distributes the parameters. Our method effectively reduces the addition of unnecessary noise, and at the same time, the model has a good effect. Experimental results on real-world data show that our proposed algorithm can effectively protect data privacy.

Download Full-text