Tutorial on Support Vector Machines

Mapping Intimacies ◽

10.21203/rs.3.rs-1200362/v1 ◽

2022 ◽

Author(s):

Raj Bridgelall

Keyword(s):

Support Vector Machines ◽

Domain Knowledge ◽

Instructional Materials ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Practical Applications ◽

Vector Machines ◽

And Performance ◽

Beginning Students

Abstract The aim of this tutorial is to help students grasp the theory and applicability of support vector machines (SVMs). The contribution is an intuitive style tutorial that helped students gain insights into SVM from a unique perspective. An internet search will reveal many videos and articles on SVM, but free peer-reviewed tutorials are generally not available or are incomplete. Instructional materials that provide simplified explanations of SVM leave gaps in the derivations that beginning students cannot fill. Most of the free tutorials also lack guidance on practical applications and considerations. The software wrappers in many modern programming libraries of Python and R currently hide the operational complexities. Such software tools often use default parameters that ignore domain knowledge or leave knowledge gaps about the important effects of SVM hyperparameters, resulting in misuse and subpar outcomes. The author uses this tutorial as a course reference for students studying artificial intelligence and machine learning. The tutorial derives the classic SVM classifier from first principles and then derives the practical form that a computer uses to train a classification model. An intuitive explanation about confusion matrices, F1 score, and the AUC metric extend insights into the inherent tradeoff between sensitivity and specificity. A discussion about cross-validation provides a basic understanding of how to select and tune the hyperparameters to maximize generalization by balancing underfitting and overfitting. Even seasoned self-learners with advanced statistical backgrounds have gained insights from this tutorial style of intuitive explanations, with all related considerations for tuning and performance evaluations in one place.

Download Full-text

Combination with Machine Learning Algorithms for the Classification in E-Bussiness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.230-232.625 ◽

2011 ◽

Vol 230-232 ◽

pp. 625-628

Author(s):

Lei Shi ◽

Xin Ming Ma ◽

Xiao Hong Hu

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Mathematical Tool ◽

Vector Machines

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.

Download Full-text

Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel

Neural Computation ◽

10.1162/089976603321891855 ◽

2003 ◽

Vol 15 (7) ◽

pp. 1667-1689 ◽

Cited By ~ 979

Author(s):

S. Sathiya Keerthi ◽

Chih-Jen Lin

Keyword(s):

Support Vector Machines ◽

Model Selection ◽

Heuristic Method ◽

Gaussian Kernel ◽

Support Vector ◽

Svm Classifier ◽

Vector Machines ◽

Rbf Kernel ◽

Linear Svm ◽

Generalization Errors

Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyper parameters: the penalty parameter C and the kernel width σ. This letter analyzes the behavior of the SVM classifier when these hyper parameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

Correlation Kernels for Support Vector Machines Classification with Applications in Cancer Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2012/205025 ◽

2012 ◽

Vol 2012 ◽

pp. 1-7 ◽

Cited By ~ 9

Author(s):

Hao Jiang ◽

Wai-Ki Ching

Keyword(s):

Support Vector Machines ◽

Positive Semidefinite ◽

Superior Performance ◽

Support Vector ◽

Svm Classifier ◽

Classification Problems ◽

Correlation Kernel ◽

Cancer Data ◽

Tumor Tissues ◽

Vector Machines

High dimensional bioinformatics data sets provide an excellent and challenging research problem in machine learning area. In particular, DNA microarrays generated gene expression data are of high dimension with significant level of noise. Supervised kernel learning with an SVM classifier was successfully applied in biomedical diagnosis such as discriminating different kinds of tumor tissues. Correlation Kernel has been recently applied to classification problems with Support Vector Machines (SVMs). In this paper, we develop a novel and parsimonious positive semidefinite kernel. The proposed kernel is shown experimentally to have better performance when compared to the usual correlation kernel. In addition, we propose a new kernel based on the correlation matrix incorporating techniques dealing with indefinite kernel. The resulting kernel is shown to be positive semidefinite and it exhibits superior performance to the two kernels mentioned above. We then apply the proposed method to some cancer data in discriminating different tumor tissues, providing information for diagnosis of diseases. Numerical experiments indicate that our method outperforms the existing methods such as the decision tree method and KNN method.

Download Full-text

Classification model for product form design using fuzzy support vector machines

Computers & Industrial Engineering ◽

10.1016/j.cie.2007.12.007 ◽

2008 ◽

Vol 55 (1) ◽

pp. 150-164 ◽

Cited By ~ 32

Author(s):

Meng-Dar Shieh ◽

Chih-Chieh Yang

Keyword(s):

Support Vector Machines ◽

Product Form ◽

Classification Model ◽

Support Vector ◽

Vector Machines ◽

Form Design ◽

Fuzzy Support Vector Machines ◽

Product Form Design

Download Full-text

Classification of High Resolution Sar Imagery Using Local Indicators of Spatial Association

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.333-335.1080 ◽

2013 ◽

Vol 333-335 ◽

pp. 1080-1084

Author(s):

Zhang Fei ◽

Ye Xi

Keyword(s):

Support Vector Machines ◽

High Resolution ◽

Classification Accuracy ◽

Spatial Association ◽

Support Vector ◽

Svm Classifier ◽

Vector Machines ◽

Good Classification ◽

Sar Imagery

In this paper, we will propose a novel classification method of high-resolution SAR using local autocorrelation and Support Vector Machines (SVM) classifier. The commonly applied spatial autocorrelation indexes, called Moran's Index; Geary's Index, Getis's Index, will be used to depict the feature of the land-cover. Then, the SVM based on these indexes will be applied as the high-resolution SAR classifier. A Cosmo-SkyMed scene in ChengDu city, China is used for our experiment. It is shown that the method proposed can lead to good classification accuracy.

Download Full-text

A New SVM Kernel for Keyword Spotting Using Confidence Measures

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500104 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550010 ◽

Cited By ~ 1

Author(s):

Yassine Ben Ayed

Keyword(s):

Support Vector Machines ◽

Hidden Markov ◽

Support Vector ◽

Svm Classifier ◽

Keyword Spotting ◽

Confidence Measure ◽

Confidence Measures ◽

Vector Machines ◽

Harmonic Means ◽

Speech Recognizer

In this paper, we propose an alternative keyword spotting method relying on confidence measures and support vector machines. Confidence measures are computed from phone information provided by a Hidden Markov Model based speech recognizer. We use three kinds of techniques, i.e., arithmetic, geometric and harmonic means to compute a confidence measure for each word. The acceptance/rejection decision of a word is based on the confidence vector processed by the SVM classifier for which we propose a new Beta kernel. The performance of the proposed SVM classifier is compared with spotting methods based on some confidence means. Experimental results presented in this paper show that the proposed SVM classifier method improves the performances of the keyword spotting system.

Download Full-text

Fast Training of Support Vector Machines and Performance Comparison with Fuzzy Classifiers

Transactions of the Institute of Systems Control and Information Engineers ◽

10.5687/iscie.15.25 ◽

2002 ◽

Vol 15 (1) ◽

pp. 25-33

Author(s):

Takuya INOUE ◽

Takahiro UEOKA ◽

Hisashi TAMAKI ◽

Shigeo ABE

Keyword(s):

Support Vector Machines ◽

Performance Comparison ◽

Support Vector ◽

Fast Training ◽

Vector Machines ◽

Fuzzy Classifiers ◽

And Performance

Download Full-text

Klasifikasi Jenis Pantun Dengan Metode Support Vector Machines (SVM)

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i5.2313 ◽

2020 ◽

Vol 4 (5) ◽

pp. 915-922

Author(s):

Helena Nurramdhani Irmanda ◽

Ria Astriratma

Keyword(s):

Support Vector Machines ◽

Training Data ◽

Classification Model ◽

Support Vector ◽

Processing Stage ◽

Stop Word ◽

Vector Machines ◽

Testing Data ◽

Extraction Stage ◽

Multiclass Svm

This study aims to create a model for categorizing pantun types and analyze the accuracy of support vector machines (SVM). The first stage is collecting pantun that have been labeled with pantun category. The pantun categories consist of pantun for children, pantun for young people, and pantun for elder. After collecting data, the next stage is pre-processing. This pre-processing stage makes data ready to be processed on the extraction stage. The pre-processing stage consists of text segmentation, case folding, tokenization, stop word removal, and stemming. The feature extraction stage is intended to analyze potential information and represent terms as a vector. Separating training data and testing data is necessary to be conducted before the classification process. Then the classification process is done by using multiclass SVM. The results of the classification are evaluated to obtain accuracy and will be analyzed whether the classification model is proper to be used. The results showed that SVM classified the types of pantun with accuracy of 81,91%.

Download Full-text

Classifying Electroencephalogram (EEG) Signals Using BAT-SVM Classifier for Detecting Epilepsy

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2021050106 ◽

2021 ◽

Vol 12 (3) ◽

pp. 96-115

Author(s):

Manal Tantawi ◽

Aya Naser ◽

Howida Shedeed ◽

Mohammed Fahmy Tolba

Keyword(s):

Support Vector Machines ◽

Epileptic Seizures ◽

Bat Algorithm ◽

Line Length ◽

Support Vector ◽

Svm Classifier ◽

Eeg Signals ◽

Vector Machines ◽

Electroencephalogram Eeg ◽

Source Of Information

Electroencephalogram (EEG) signals are a valuable source of information for detecting epileptic seizures. However, monitoring EEG for long periods of time is very exhausting and time consuming. Thus, detecting epilepsy in EEG signals automatically is highly appreciated. In this study, three classes, namely normal, interictal (out of seizure time), and ictal (during seizure), are considered. Moreover, a comparative study is provided for the efficient features in literature resulting in a suggested combination of only three discriminative features, namely R'enyi entropy, line length, and energy. These features are calculated from each of the EEG sub-bands. Finally, support vector machines (SVM) classifier optimized using BAT algorithm (BAT-SVM) is introduced by this study for discriminating between the three classes. Experiments were conducted using Andrzejak database. The accomplished experiments and comparisons in this study emphasize the superiority of the proposed BAT-SVM along with the suggested feature set in achieving the best results.

Download Full-text