scholarly journals Tutorial on Support Vector Machines

Author(s):  
Raj Bridgelall

Abstract The aim of this tutorial is to help students grasp the theory and applicability of support vector machines (SVMs). The contribution is an intuitive style tutorial that helped students gain insights into SVM from a unique perspective. An internet search will reveal many videos and articles on SVM, but free peer-reviewed tutorials are generally not available or are incomplete. Instructional materials that provide simplified explanations of SVM leave gaps in the derivations that beginning students cannot fill. Most of the free tutorials also lack guidance on practical applications and considerations. The software wrappers in many modern programming libraries of Python and R currently hide the operational complexities. Such software tools often use default parameters that ignore domain knowledge or leave knowledge gaps about the important effects of SVM hyperparameters, resulting in misuse and subpar outcomes. The author uses this tutorial as a course reference for students studying artificial intelligence and machine learning. The tutorial derives the classic SVM classifier from first principles and then derives the practical form that a computer uses to train a classification model. An intuitive explanation about confusion matrices, F1 score, and the AUC metric extend insights into the inherent tradeoff between sensitivity and specificity. A discussion about cross-validation provides a basic understanding of how to select and tune the hyperparameters to maximize generalization by balancing underfitting and overfitting. Even seasoned self-learners with advanced statistical backgrounds have gained insights from this tutorial style of intuitive explanations, with all related considerations for tuning and performance evaluations in one place.

2011 ◽  
Vol 230-232 ◽  
pp. 625-628
Author(s):  
Lei Shi ◽  
Xin Ming Ma ◽  
Xiao Hong Hu

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.


2003 ◽  
Vol 15 (7) ◽  
pp. 1667-1689 ◽  
Author(s):  
S. Sathiya Keerthi ◽  
Chih-Jen Lin

Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyper parameters: the penalty parameter C and the kernel width σ. This letter analyzes the behavior of the SVM classifier when these hyper parameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.


2020 ◽  
Vol 24 (5) ◽  
pp. 1141-1160
Author(s):  
Tomás Alegre Sepúlveda ◽  
Brian Keith Norambuena

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.


2012 ◽  
Vol 2012 ◽  
pp. 1-7 ◽  
Author(s):  
Hao Jiang ◽  
Wai-Ki Ching

High dimensional bioinformatics data sets provide an excellent and challenging research problem in machine learning area. In particular, DNA microarrays generated gene expression data are of high dimension with significant level of noise. Supervised kernel learning with an SVM classifier was successfully applied in biomedical diagnosis such as discriminating different kinds of tumor tissues. Correlation Kernel has been recently applied to classification problems with Support Vector Machines (SVMs). In this paper, we develop a novel and parsimonious positive semidefinite kernel. The proposed kernel is shown experimentally to have better performance when compared to the usual correlation kernel. In addition, we propose a new kernel based on the correlation matrix incorporating techniques dealing with indefinite kernel. The resulting kernel is shown to be positive semidefinite and it exhibits superior performance to the two kernels mentioned above. We then apply the proposed method to some cancer data in discriminating different tumor tissues, providing information for diagnosis of diseases. Numerical experiments indicate that our method outperforms the existing methods such as the decision tree method and KNN method.


2013 ◽  
Vol 333-335 ◽  
pp. 1080-1084
Author(s):  
Zhang Fei ◽  
Ye Xi

In this paper, we will propose a novel classification method of high-resolution SAR using local autocorrelation and Support Vector Machines (SVM) classifier. The commonly applied spatial autocorrelation indexes, called Moran's Index; Geary's Index, Getis's Index, will be used to depict the feature of the land-cover. Then, the SVM based on these indexes will be applied as the high-resolution SAR classifier. A Cosmo-SkyMed scene in ChengDu city, China is used for our experiment. It is shown that the method proposed can lead to good classification accuracy.


2015 ◽  
Vol 24 (03) ◽  
pp. 1550010 ◽  
Author(s):  
Yassine Ben Ayed

In this paper, we propose an alternative keyword spotting method relying on confidence measures and support vector machines. Confidence measures are computed from phone information provided by a Hidden Markov Model based speech recognizer. We use three kinds of techniques, i.e., arithmetic, geometric and harmonic means to compute a confidence measure for each word. The acceptance/rejection decision of a word is based on the confidence vector processed by the SVM classifier for which we propose a new Beta kernel. The performance of the proposed SVM classifier is compared with spotting methods based on some confidence means. Experimental results presented in this paper show that the proposed SVM classifier method improves the performances of the keyword spotting system.


2020 ◽  
Vol 4 (5) ◽  
pp. 915-922
Author(s):  
Helena Nurramdhani Irmanda ◽  
Ria Astriratma

This study aims to create a model for categorizing pantun types and analyze the accuracy of support vector machines (SVM). The first stage is collecting pantun that have been labeled with pantun category. The pantun categories consist of pantun for children, pantun for young people, and pantun for elder. After collecting data, the next stage is pre-processing. This pre-processing stage makes data ready to be processed on the extraction stage. The pre-processing stage consists of text segmentation, case folding, tokenization, stop word removal, and stemming. The feature extraction stage is intended to analyze potential information and represent terms as a vector. Separating training data and testing data is necessary to be conducted before the classification process. Then the classification process is done by using multiclass SVM. The results of the classification are evaluated to obtain accuracy and will be analyzed whether the classification model is proper to be used. The results showed that SVM classified the types of pantun with accuracy of 81,91%.  


Author(s):  
Manal Tantawi ◽  
Aya Naser ◽  
Howida Shedeed ◽  
Mohammed Fahmy Tolba

Electroencephalogram (EEG) signals are a valuable source of information for detecting epileptic seizures. However, monitoring EEG for long periods of time is very exhausting and time consuming. Thus, detecting epilepsy in EEG signals automatically is highly appreciated. In this study, three classes, namely normal, interictal (out of seizure time), and ictal (during seizure), are considered. Moreover, a comparative study is provided for the efficient features in literature resulting in a suggested combination of only three discriminative features, namely R'enyi entropy, line length, and energy. These features are calculated from each of the EEG sub-bands. Finally, support vector machines (SVM) classifier optimized using BAT algorithm (BAT-SVM) is introduced by this study for discriminating between the three classes. Experiments were conducted using Andrzejak database. The accomplished experiments and comparisons in this study emphasize the superiority of the proposed BAT-SVM along with the suggested feature set in achieving the best results.


Sign in / Sign up

Export Citation Format

Share Document