scholarly journals Differentially Private Kernel Support Vector Machines Based on the Exponential and Laplace Hybrid Mechanism

2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Zhenlong Sun ◽  
Jing Yang ◽  
Xiaoye Li ◽  
Jianpei Zhang

Support vector machines (SVMs) are among the most robust and accurate methods in all well-known machine learning algorithms, especially for classification. The SVMs train a classification model by solving an optimization problem to decide which instances in the training datasets are the support vectors (SVs). However, SVs are intact instances taken from the training datasets and directly releasing the classification model of the SVMs will carry significant risk to the privacy of individuals, when the training datasets contain sensitive information. In this paper, we study the problem of how to release the classification model of kernel SVMs while preventing privacy leakage of the SVs and satisfying the requirement of privacy protection. We propose a new differentially private algorithm for the kernel SVMs based on the exponential and Laplace hybrid mechanism named DPKSVMEL. The DPKSVMEL algorithm has two major advantages compared with existing private SVM algorithms. One is that it protects the privacy of the SVs by postprocessing and the training process of the non-private kernel SVMs does not change. Another is that the scoring function values are directly derived from the symmetric kernel matrix generated during the training process and does not require additional storage space and complex sensitivity analysis. In the DPKSVMEL algorithm, we define a similarity parameter to denote the correlation or distance between the non-SVs and every SV. And then, every non-SV is divided into a group with one of the SVs according to the maximal value of the similarity. Under some certain similarity parameter value, we replace every SV with a mean value of the top-k randomly selected most similar non-SVs within the group by the exponential mechanism if the number of non-SVs is greater than k. Otherwise, we add random noise to the SVs by the Laplace mechanism. We theoretically prove that the DPKSVMEL algorithm satisfies differential privacy. The extensive experiments show the effectiveness of the DPKSVMEL algorithm for kernel SVMs on real datasets; meanwhile, it achieves higher classification accuracy than existing private SVM algorithms.

2011 ◽  
Vol 230-232 ◽  
pp. 625-628
Author(s):  
Lei Shi ◽  
Xin Ming Ma ◽  
Xiao Hong Hu

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yao Huimin

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.


2016 ◽  
Vol 24 (1) ◽  
pp. 24-42 ◽  
Author(s):  
Claudia Ehrentraut ◽  
Markus Ekholm ◽  
Hideyuki Tanushi ◽  
Jörg Tiedemann ◽  
Hercules Dalianis

Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


2020 ◽  
Vol 24 (5) ◽  
pp. 1141-1160
Author(s):  
Tomás Alegre Sepúlveda ◽  
Brian Keith Norambuena

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.


Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 383
Author(s):  
Francis Effirim Botchey ◽  
Zhen Qin ◽  
Kwesi Hughes-Lartey

The onset of COVID-19 has re-emphasized the importance of FinTech especially in developing countries as the major powers of the world are already enjoying the advantages that come with the adoption of FinTech. Handling of physical cash has been established as a means of transmitting the novel corona virus. Again, research has established that, been unbanked raises the potential of sinking one into abject poverty. Over the years, developing countries have been piloting the various forms of FinTech, but the very one that has come to stay is the Mobile Money Transactions (MMT). As mobile money transactions attempt to gain a foothold, it faces several problems, the most important of them is mobile money fraud. This paper seeks to provide a solution to this problem by looking at machine learning algorithms based on support vector machines (kernel-based), gradient boosted decision tree (tree-based) and Naïve Bayes (probabilistic based) algorithms, taking into consideration the imbalanced nature of the dataset. Our experiments showed that the use of gradient boosted decision tree holds a great potential in combating the problem of mobile money fraud as it was able to produce near perfect results.


2010 ◽  
Vol 07 (01) ◽  
pp. 59-80
Author(s):  
D. CHENG ◽  
S. Q. XIE ◽  
E. HÄMMERLE

Local descriptor matching is the most overlooked stage of the three stages of the local descriptor process, and this paper proposes a new method for matching local descriptors based on support vector machines. Results from experiments show that the developed method is more robust for matching local descriptors for all image transformations considered. The method is able to be integrated with different local descriptor methods, and with different machine learning algorithms and this shows that the approach is sufficiently robust and versatile.


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Mahmood Umar ◽  
Nor Bahiah Ahmad ◽  
Anazida Zainal

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset. 


2021 ◽  
Author(s):  
Igor Miranda ◽  
Gildeberto Cardoso ◽  
Madhurananda Pahar ◽  
Gabriel Oliveira ◽  
Thomas Niesler

Predicting the need for hospitalization due to COVID-19 may help patients to seek timely treatment and assist health professionals to monitor cases and allocate resources. We investigate the use of machine learning algorithms to predict the risk of hospitalization due to COVID-19 using the patient's medical history and self-reported symptoms, regardless of the period in which they occurred. Three datasets containing information regarding 217,580 patients from three different states in Brazil have been used. Decision trees, neural networks, and support vector machines were evaluated, achieving accuracies between 79.1% to 84.7%. Our analysis shows that better performance is achieved in Brazilian states ranked more highly in terms of the official human development index (HDI), suggesting that health facilities with better infrastructure generate data that is less noisy. One of the models developed in this study has been incorporated into a mobile app that is available for public use.


Sign in / Sign up

Export Citation Format

Share Document