scholarly journals DPWSS: differentially private working set selection for training support vector machines

2021 ◽  
Vol 7 ◽  
pp. e799
Author(s):  
Zhenlong Sun ◽  
Jing Yang ◽  
Xiaoye Li ◽  
Jianpei Zhang

Support vector machine (SVM) is a robust machine learning method and is widely used in classification. However, the traditional SVM training methods may reveal personal privacy when the training data contains sensitive information. In the training process of SVMs, working set selection is a vital step for the sequential minimal optimization-type decomposition methods. To avoid complex sensitivity analysis and the influence of high-dimensional data on the noise of the existing SVM classifiers with privacy protection, we propose a new differentially private working set selection algorithm (DPWSS) in this paper, which utilizes the exponential mechanism to privately select working sets. We theoretically prove that the proposed algorithm satisfies differential privacy. The extended experiments show that the DPWSS algorithm achieves classification capability almost the same as the original non-privacy SVM under different parameters. The errors of optimized objective value between the two algorithms are nearly less than two, meanwhile, the DPWSS algorithm has a higher execution efficiency than the original non-privacy SVM by comparing iterations on different datasets. To the best of our knowledge, DPWSS is the first private working set selection algorithm based on differential privacy.

Author(s):  
Benjamin I. P. Rubinstein ◽  
Peter L. Bartlett ◽  
Ling Huang ◽  
Nina Taft

The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.


2008 ◽  
Vol 20 (2) ◽  
pp. 374-382 ◽  
Author(s):  
Tobias Glasmachers ◽  
Christian Igel

Iterative learning algorithms that approximate the solution of support vector machines (SVMs) have two potential advantages. First, they allow online and active learning. Second, for large data sets, computing the exact SVM solution may be too time-consuming, and an efficient approximation can be preferable. The powerful LASVM iteratively approaches the exact SVM solution using sequential minimal optimization (SMO). It allows efficient online and active learning. Here, this algorithm is considerably improved in speed and accuracy by replacing the working set selection in the SMO steps. A second-order working set selection strategy, which greedily aims at maximizing the progress in each single step, is incorporated.


2020 ◽  
Vol 30 (03) ◽  
pp. 2050009 ◽  
Author(s):  
Mengfan Li ◽  
Fang Lin ◽  
Guizhi Xu

Traditional training methods need to collect a large amount of data for every subject to train a subject-specific classifier, which causes subjects fatigue and training burden. This study proposes a novel training method, TrAdaBoost based on cross-validation and an adaptive threshold (CV-T-TAB), to reduce the amount of data required for training by selecting and combining multiple subjects’ classifiers that perform well on a new subject to train a classifier. This method adopts cross-validation to extend the amount of the new subject’s training data and sets an adaptive threshold to select the optimal combination of the classifiers. Twenty-five subjects participated in the N200- and P300-based brain–computer interface. The study compares CV-T-TAB to five traditional training methods by testing them on the training of a support vector machine. The accuracy, information transfer rate, area under the curve, recall and precision are used to evaluate the performances under nine conditions with different amounts of data. CV-T-TAB outperforms the other methods and retains a high accuracy even when the amount of data is reduced to one-third of the original amount. The results imply that CV-T-TAB is effective in improving the performance of a subject-specific classifier with a small amount of data by adopting multiple subjects’ classifiers, which reduces the training cost.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yang Bai ◽  
Yu Li ◽  
Mingchuang Xie ◽  
Mingyu Fan

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.


2010 ◽  
Vol 29-32 ◽  
pp. 947-951
Author(s):  
Li Yan Tian ◽  
Xiao Guang Hu

A fast training support vector machine using parallel sequential minimal optimization is presented in this paper. Up to now, sequential minimal optimization (SMO) is one of the major algorithms for training SVM, but it still requires a large amount of computation time for the large sample problems. Unlike the traditional SMO, the parallel SMO partitions the entire training data set into small subsets first and then runs multiple CPU processors to seal with each of the partitioned data set. Experiments show that the new algorithm has great advantage in terms of speediness when applied to problems with large training sets and high dimensional spaces without reducing generalization performance of SVM.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Lina Ni ◽  
Peng Huang ◽  
Yongshan Wei ◽  
Minglei Shu ◽  
Jinquan Zhang

With the proliferation of intelligent services and applications authorized by artificial intelligence, the Internet of Things has penetrated into many aspects of our daily lives, and the medical field is no exception. The medical Internet of Things (MIoT) can be applied to wearable devices, remote diagnosis, mobile medical treatment, and remote monitoring. There is a large amount of medical information in the databases of various medical institutions. Nevertheless, due to the particularity of medical data, it is extremely related to personal privacy, and the data cannot be shared, resulting in data islands. Federated learning (FL), as a distributed collaborative artificial intelligence method, provides a solution. However, FL also involves multiple security and privacy issues. This paper proposes an adaptive Differential Privacy Federated Learning Medical IoT (DPFL-MIoT) model. Specifically, when the user updates the model locally, we propose a differential privacy federated learning deep neural network with adaptive gradient descent (DPFLAGD-DNN) algorithm, which can adaptively add noise to the model parameters according to the characteristics and gradient of the training data. Since privacy leaks often occur in downlink, we present differential privacy federated learning (DP-FL) algorithm where adaptive noise is added to the parameters when the server distributes the parameters. Our method effectively reduces the addition of unnecessary noise, and at the same time, the model has a good effect. Experimental results on real-world data show that our proposed algorithm can effectively protect data privacy.


Sign in / Sign up

Export Citation Format

Share Document