Differentially Private Kernel Support Vector Machines Based on the Exponential and Laplace Hybrid Mechanism

Support vector machines (SVMs) are among the most robust and accurate methods in all well-known machine learning algorithms, especially for classification. The SVMs train a classification model by solving an optimization problem to decide which instances in the training datasets are the support vectors (SVs). However, SVs are intact instances taken from the training datasets and directly releasing the classification model of the SVMs will carry significant risk to the privacy of individuals, when the training datasets contain sensitive information. In this paper, we study the problem of how to release the classification model of kernel SVMs while preventing privacy leakage of the SVs and satisfying the requirement of privacy protection. We propose a new differentially private algorithm for the kernel SVMs based on the exponential and Laplace hybrid mechanism named DPKSVMEL. The DPKSVMEL algorithm has two major advantages compared with existing private SVM algorithms. One is that it protects the privacy of the SVs by postprocessing and the training process of the non-private kernel SVMs does not change. Another is that the scoring function values are directly derived from the symmetric kernel matrix generated during the training process and does not require additional storage space and complex sensitivity analysis. In the DPKSVMEL algorithm, we define a similarity parameter to denote the correlation or distance between the non-SVs and every SV. And then, every non-SV is divided into a group with one of the SVs according to the maximal value of the similarity. Under some certain similarity parameter value, we replace every SV with a mean value of the top-k randomly selected most similar non-SVs within the group by the exponential mechanism if the number of non-SVs is greater than k. Otherwise, we add random noise to the SVs by the Laplace mechanism. We theoretically prove that the DPKSVMEL algorithm satisfies differential privacy. The extensive experiments show the effectiveness of the DPKSVMEL algorithm for kernel SVMs on real datasets; meanwhile, it achieves higher classification accuracy than existing private SVM algorithms.

Download Full-text

Combination with Machine Learning Algorithms for the Classification in E-Bussiness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.230-232.625 ◽

2011 ◽

Vol 230-232 ◽

pp. 625-628

Author(s):

Lei Shi ◽

Xin Ming Ma ◽

Xiao Hong Hu

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Mathematical Tool ◽

Vector Machines

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting

Health Informatics Journal ◽

10.1177/1460458216656471 ◽

2016 ◽

Vol 24 (1) ◽

pp. 24-42 ◽

Cited By ~ 11

Author(s):

Claudia Ehrentraut ◽

Markus Ekholm ◽

Hideyuki Tanushi ◽

Jörg Tiedemann ◽

Hercules Dalianis

Keyword(s):

Support Vector Machines ◽

Significant Risk ◽

Parameter Tuning ◽

Hospital Staff ◽

Support Vector ◽

Patient Records ◽

Hospital Acquired Infections ◽

Vector Machines ◽

Patient Health ◽

Hospital Acquired

Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.

Download Full-text

Linear Support Vector Machines for Prediction of Student Performance in School-Based Education

Mathematical Problems in Engineering ◽

10.1155/2020/4761468 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Nalindren Naicker ◽

Timothy Adeliyi ◽

Jeanette Wing

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Student Performance ◽

State Of The Art ◽

Learning Algorithms ◽

The State ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Support Vector ◽

Vector Machines

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

Mobile Money Fraud Prediction—A Cross-Case Analysis on the Efficiency of Support Vector Machines, Gradient Boosted Decision Trees, and Naïve Bayes Algorithms

Information ◽

10.3390/info11080383 ◽

2020 ◽

Vol 11 (8) ◽

pp. 383

Author(s):

Francis Effirim Botchey ◽

Zhen Qin ◽

Kwesi Hughes-Lartey

Keyword(s):

Developing Countries ◽

Support Vector Machines ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Mobile Money ◽

Vector Machines ◽

Boosted Decision Tree

The onset of COVID-19 has re-emphasized the importance of FinTech especially in developing countries as the major powers of the world are already enjoying the advantages that come with the adoption of FinTech. Handling of physical cash has been established as a means of transmitting the novel corona virus. Again, research has established that, been unbanked raises the potential of sinking one into abject poverty. Over the years, developing countries have been piloting the various forms of FinTech, but the very one that has come to stay is the Mobile Money Transactions (MMT). As mobile money transactions attempt to gain a foothold, it faces several problems, the most important of them is mobile money fraud. This paper seeks to provide a solution to this problem by looking at machine learning algorithms based on support vector machines (kernel-based), gradient boosted decision tree (tree-based) and Naïve Bayes (probabilistic based) algorithms, taking into consideration the imbalanced nature of the dataset. Our experiments showed that the use of gradient boosted decision tree holds a great potential in combating the problem of mobile money fraud as it was able to produce near perfect results.

Download Full-text

LOCAL DESCRIPTOR MATCHING WITH SUPPORT VECTOR MACHINES

International Journal of Information Acquisition ◽

10.1142/s0219878910002051 ◽

2010 ◽

Vol 07 (01) ◽

pp. 59-80

Author(s):

D. CHENG ◽

S. Q. XIE ◽

E. HÄMMERLE

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Local Descriptor ◽

Local Descriptors ◽

Vector Machines ◽

Three Stages ◽

Image Transformations

Local descriptor matching is the most overlooked stage of the three stages of the local descriptor process, and this paper proposes a new method for matching local descriptors based on support vector machines. Results from experiments show that the developed method is more robust for matching local descriptors for all image transformations considered. The method is able to be integrated with different local descriptor methods, and with different machine learning algorithms and this shows that the approach is sufficiently robust and versatile.

Download Full-text

Classification model for product form design using fuzzy support vector machines

Computers & Industrial Engineering ◽

10.1016/j.cie.2007.12.007 ◽

2008 ◽

Vol 55 (1) ◽

pp. 150-164 ◽

Cited By ~ 32

Author(s):

Meng-Dar Shieh ◽

Chih-Chieh Yang

Keyword(s):

Support Vector Machines ◽

Product Form ◽

Classification Model ◽

Support Vector ◽

Vector Machines ◽

Form Design ◽

Fuzzy Support Vector Machines ◽

Product Form Design

Download Full-text

Sentiment Analysis of Student’s Opinion on Programming Assessment: Evaluation of Naïve Bayes over Support Vector Machines

International Journal of Innovative Computing ◽

10.11113/ijic.v10n2.278 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Mahmood Umar ◽

Nor Bahiah Ahmad ◽

Anazida Zainal

Keyword(s):

Support Vector Machines ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Experimental Result ◽

Support Vector ◽

Small Data ◽

Data Set ◽

Vector Machines

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset.

Download Full-text

Machine Learning Prediction of Hospitalization due to COVID-19 based on Self-Reported Symptoms: A Study for Brazil*

10.36227/techrxiv.13736698 ◽

2021 ◽

Author(s):

Igor Miranda ◽

Gildeberto Cardoso ◽

Madhurananda Pahar ◽

Gabriel Oliveira ◽

Thomas Niesler

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machines ◽

Health Professionals ◽

Human Development Index ◽

Mobile App ◽

Machine Learning Algorithms ◽

Support Vector ◽

Vector Machines ◽

Timely Treatment

Predicting the need for hospitalization due to COVID-19 may help patients to seek timely treatment and assist health professionals to monitor cases and allocate resources. We investigate the use of machine learning algorithms to predict the risk of hospitalization due to COVID-19 using the patient's medical history and self-reported symptoms, regardless of the period in which they occurred. Three datasets containing information regarding 217,580 patients from three different states in Brazil have been used. Decision trees, neural networks, and support vector machines were evaluated, achieving accuracies between 79.1% to 84.7%. Our analysis shows that better performance is achieved in Brazilian states ranked more highly in terms of the official human development index (HDI), suggesting that health facilities with better infrastructure generate data that is less noisy. One of the models developed in this study has been incorporated into a mobile app that is available for public use.

Download Full-text