WR-SVM Model Based on the Margin Radius Approach for Solving the Minimum Enclosing Ball Problem in Support Vector Machine Classification

The generalization error of conventional support vector machine (SVM) depends on the ratio of two factors; radius and margin. The traditional SVM aims to maximize margin but ignore minimization of radius, which decreases the overall performance of the SVM classifier. However, different approaches are developed to achieve a trade-off between the margin and radius. Still, the computational cost of all these approaches is high due to the requirements of matrix transformation. Furthermore, a conventional SVM tries to set the best hyperplane between classes, and due to some robust kernel tricks, an SVM is used in many non-linear and complex problems. The configuration of the best hyperplane between classes is not effective; therefore, it is required to bind a class within its limited area to enhance the performance of the SVM classifier. The area enclosed by a class is called its Minimum Enclosing Ball (MEB), and it is one of the emerging problems of SVM. Therefore, a robust solution is needed to improve the performance of the conventional SVM to overcome the highlighted issues. In this research study, a novel weighted radius SVM (WR-SVM) is proposed to determine the tighter bounds of MEB. The proposed solution uses a weighted mean to find tighter bounds of radius, due to which the size of MEB decreases. Experiments are conducted on nine different benchmark datasets and one synthetic dataset to demonstrate the effectiveness of our proposed model. The experimental results reveal that the proposed WR-SVM significantly performed well compared to the conventional SVM classifier. Furthermore, experimental results are compared with F-SVM and traditional SVM in terms of classification accuracy to demonstrate the significance of the proposed WR-SVM.

Download Full-text

Predicting Apoptosis Protein Subcellular Locations based on the Protein Overlapping Property Matrix and Tri-Gram Encoding

International Journal of Molecular Sciences ◽

10.3390/ijms20092344 ◽

2019 ◽

Vol 20 (9) ◽

pp. 2344

Author(s):

Yang Yang ◽

Huiwen Zheng ◽

Chunhua Wang ◽

Wanyue Xiao ◽

Taigang Liu

Keyword(s):

Support Vector Machine ◽

Subcellular Location ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Protein Subcellular Location ◽

Promising Tool ◽

Apoptosis Protein ◽

Benchmark Datasets ◽

Apoptosis Proteins

To reveal the working pattern of programmed cell death, knowledge of the subcellular location of apoptosis proteins is essential. Besides the costly and time-consuming method of experimental determination, research into computational locating schemes, focusing mainly on the innovation of representation techniques on protein sequences and the selection of classification algorithms, has become popular in recent decades. In this study, a novel tri-gram encoding model is proposed, which is based on using the protein overlapping property matrix (POPM) for predicting apoptosis protein subcellular location. Next, a 1000-dimensional feature vector is built to represent a protein. Finally, with the help of support vector machine-recursive feature elimination (SVM-RFE), we select the optimal features and put them into a support vector machine (SVM) classifier for predictions. The results of jackknife tests on two benchmark datasets demonstrate that our proposed method can achieve satisfactory prediction performance level with less computing capacity required and could work as a promising tool to predict the subcellular locations of apoptosis proteins.

Download Full-text

Support vector machine and its difficulties from control field of view

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220977436 ◽

2021 ◽

pp. 014233122097743

Author(s):

Maryam Yalsavar ◽

Paknoosh Karimaghaei ◽

Akbar Sheikh-Akbari ◽

Pancham Shukla ◽

Peyman Setoodeh

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Regularization Parameter ◽

Experimental Results ◽

Support Vector ◽

Support Vectors ◽

Kernel Parameter ◽

Wide Range ◽

Benchmark Datasets ◽

Training Error

The application of the support vector machine (SVM) classification algorithm to large-scale datasets is limited due to its use of a large number of support vectors and dependency of its performance on its kernel parameter. In this paper, SVM is redefined as a control system and iterative learning control (ILC) method is used to optimize SVM’s kernel parameter. The ILC technique first defines an error equation and then iteratively updates the kernel function and its regularization parameter using the training error and the previous state of the system. The closed loop structure of the proposed algorithm increases the robustness of the technique to uncertainty and improves its convergence speed. Experimental results were generated using nine standard benchmark datasets covering a wide range of applications. Experimental results show that the proposed method generates superior or very competitive results in term of accuracy than those of classical and state-of-the-art SVM based techniques while using a significantly smaller number of support vectors.

Download Full-text

A NEW TECHNIQUE FOR SELECTING FEATURES FROM PROTEIN SEQUENCES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800140600465x ◽

2006 ◽

Vol 20 (02) ◽

pp. 271-283 ◽

Cited By ~ 4

Author(s):

XING-MING ZHAO ◽

JI-XIANG DU ◽

HONG-QIANG WANG ◽

YUNPING ZHU ◽

YIXUE LI

Keyword(s):

Support Vector Machine ◽

Relative Entropy ◽

Protein Sequences ◽

Entropy Method ◽

Experimental Results ◽

New Method ◽

Support Vector ◽

Svm Classifier ◽

A New Technique ◽

Relative Entropy Method

A new method for selecting features from protein sequences is proposed in this paper. First, the protein sequences are converted into fixed-dimensional feature vectors. Then, a subset of features is selected using relative entropy method and used as the inputs for Support Vector Machine (SVM). Finally, the trained SVM classifier is utilized to classify protein sequences into certain known protein families. Experimental results over proteins obtained from PIR database and GPCRs have shown that our proposed approach is really effective and efficient in selecting features from protein sequences.

Download Full-text

The effects of globalisation techniques on feature selection for text classification

Journal of Information Science ◽

10.1177/0165551520930897 ◽

2020 ◽

pp. 016555152093089

Author(s):

Bekir Parlak ◽

Alper Kursat Uysal

Keyword(s):

Feature Selection ◽

Text Classification ◽

High Volume ◽

Experimental Results ◽

Support Vector ◽

Svm Classifier ◽

Chi Square ◽

Discriminative Feature ◽

Benchmark Datasets ◽

Different Characteristics

Text classification (TC) is very important and critical task in the 21th century as there exist high volume of electronic data on the Internet. In TC, textual data are characterised by a huge number of highly sparse features/terms. A typical TC consists of many steps and one of the most important steps is undoubtedly feature selection (FS). In this study, we have comprehensively investigated the effects of various globalisation techniques on local feature selection (LFS) methods using datasets with different characteristics such as multi-class unbalanced (MCU), multi-class balanced (MCB), binary-class unbalanced (BCU) and binary-class balanced (BCB). The globalisation techniques used in this study are summation (SUM), weighted-sum (AVG), and maximum (MAX). To investigate the effect of globalisation techniques, we used three LFS methods named as Discriminative Feature Selection (DFSS), odds ratio (OR) and chi-square (CHI2). In the experiments, we have utilised four different benchmark datasets named as Reuters-21578, 20Newsgroup., Enron1, and Polarity in addition to Support Vector Machines (SVM) and Decision Tree (DT) classifiers. According to the experimental results, the most successful globalisation technique is AVG while all situations are taken into account. The experimental results indicate that DFSS method is more successful than OR and CHI2 methods on datasets with MCU and MCB characteristics. However, CHI2 method seems more accurate than OR and DFSS methods on datasets with BCU and BCB characteristics. Also, SVM classifier performed better than DT classifier in most cases.

Download Full-text

Efficiency of SVM classifier with Word2Vec and Doc2Vec models

Proceedings of the International Conference on Applied Statistics ◽

10.2478/icas-2019-0043 ◽

2019 ◽

Vol 1 (1) ◽

pp. 496-503 ◽

Cited By ~ 1

Author(s):

Maria Mihaela Truşcă

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Computational Cost ◽

Data Representation ◽

Training Data ◽

Support Vector ◽

Svm Classifier ◽

Machine Model ◽

Text Data ◽

Numerical Attributes

Abstract Support Vector Machine model is one of the most intensive used text data classifiers ever since the moment of its development. However, its performance depends not only on its features but also on data preprocessing and model tuning. The main purpose of this paper is to compare the efficiency of more Support Vector Machine models using both TF-IDF approach and Word2Vec and Doc2Vec neural networks for text data representation. Besides the data vectorization process, I try to enhance the models’ efficiency by identifying which kind of kernel fits better the data or if it is just better to opt for the linear case. My results prove that for the “Reuters 21578” dataset, nonlinear Support Vector Machine is more efficient when the conversion of text data into numerical attributes is realized using Word2Vec models instead of TF-IDF and Doc2Vec representations. When it is considered that data meet linear separability requirements, TF-IDF representation outperforms all other options. Surprisingly, Doc2Vec models have the lowest performance and only in terms of computational cost they provide satisfactory results. This paper proves that while Word2Vec models are truly efficient for text data representation, Doc2Vec neural networks are unable to exceed even TF-IDF index representation. This evidence contradicts the common idea according to which Doc2Vec models should provide a better insight into the training data domain than Word2Vec models and certainly than the TF-IDF index.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Analog Circuit Fault Diagnosis Based on Support Vector Machine Classifier and Fuzzy Feature Selection

Electronics ◽

10.3390/electronics10121496 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1496

Author(s):

Hao Liang ◽

Yiman Zhu ◽

Dongyang Zhang ◽

Le Chang ◽

Yuming Lu ◽

...

Keyword(s):

Support Vector Machine ◽

Fault Diagnosis ◽

Mutual Information ◽

Analog Circuit ◽

Fault Classification ◽

Support Vector ◽

Svm Classifier ◽

Fault Parameters ◽

Diagnosis Method ◽

Circuit Fault Diagnosis

In analog circuit, the component parameters have tolerances and the fault component parameters present a wide distribution, which brings obstacle to classification diagnosis. To tackle this problem, this article proposes a soft fault diagnosis method combining the improved barnacles mating optimizer(BMO) algorithm with the support vector machine (SVM) classifier, which can achieve the minimum redundancy and maximum relevance for feature dimension reduction with fuzzy mutual information. To be concrete, first, the improved barnacles mating optimizer algorithm is used to optimize the parameters for learning and classification. We adopt six test functions that are on three data sets from the University of California, Irvine (UCI) machine learning repository to test the performance of SVM classifier with five different optimization algorithms. The results show that the SVM classifier combined with the improved barnacles mating optimizer algorithm is characterized with high accuracy in classification. Second, fuzzy mutual information, enhanced minimum redundancy, and maximum relevance principle are applied to reduce the dimension of the feature vector. Finally, a circuit experiment is carried out to verify that the proposed method can achieve fault classification effectively when the fault parameters are both fixed and distributed. The accuracy of the proposed fault diagnosis method is 92.9% when the fault parameters are distributed, which is 1.8% higher than other classifiers on average. When the fault parameters are fixed, the accuracy rate is 99.07%, which is 0.7% higher than other classifiers on average.

Download Full-text

Intuitionistic Fuzzy Laplacian Twin Support Vector Machine for Semi-supervised Classification

Journal of the Operations Research Society of China ◽

10.1007/s40305-021-00354-9 ◽

2021 ◽

Author(s):

Jia-Bin Zhou ◽

Yan-Qin Bai ◽

Yan-Ru Guo ◽

Hai-Xiang Lin

Keyword(s):

Support Vector Machine ◽

Negative Impact ◽

Twin Support Vector Machine ◽

Fuzzy Membership ◽

Support Vector ◽

Membership Functions ◽

Fuzzy Membership Functions ◽

Intuitionistic Fuzzy ◽

Benchmark Datasets ◽

The Impact

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.

Download Full-text

Detection and Recognition of RF Devices Using Support Vector Machine

International Journal of Interdisciplinary Telecommunications and Networking ◽

10.4018/ijitn.2013100102 ◽

2013 ◽

Vol 5 (4) ◽

pp. 13-20

Author(s):

Shikhar P. Acharya ◽

Ivan G. Guardiola

Keyword(s):

Support Vector Machine ◽

Radio Frequency ◽

Experimental Results ◽

Support Vector ◽

Noise Band ◽

Detection And Identification ◽

Electromagnetic Emissions ◽

Rf Devices ◽

Unintended Electromagnetic Emissions ◽

Detection And Recognition

Radio Frequency (RF) devices produce some amount of Unintended Electromagnetic Emissions (UEEs). UEEs are generally unique to a device and can be used as a signature for the purpose of detection and identification. The problem with UEEs is that they are very low in power and are often buried deep inside the noise band. The research herein provides the application of Support Vector Machine (SVM) for detection and identification of RF devices using their UEEs. Experimental Results shows that SVM can detect RF devices within the noise band, and can also identify RF devices using their UEEs.

Download Full-text

A Roller Bearing Fault Diagnosis Method Based on LCD Energy Entropy and ACROA-SVM

Shock and Vibration ◽

10.1155/2014/825825 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 25

Author(s):

HungLinh Ao ◽

Junsheng Cheng ◽

Kenli Li ◽

Tung Khac Truong

Keyword(s):

Support Vector Machine ◽

Fault Diagnosis ◽

Roller Bearing ◽

Local Characteristic ◽

Support Vector ◽

Svm Classifier ◽

Outer Race ◽

Bearing Fault ◽

Bearing Fault Diagnosis ◽

Energy Entropy

This study investigates a novel method for roller bearing fault diagnosis based on local characteristic-scale decomposition (LCD) energy entropy, together with a support vector machine designed using an Artificial Chemical Reaction Optimisation Algorithm, referred to as an ACROA-SVM. First, the original acceleration vibration signals are decomposed into intrinsic scale components (ISCs). Second, the concept of LCD energy entropy is introduced. Third, the energy features extracted from a number of ISCs that contain the most dominant fault information serve as input vectors for the support vector machine classifier. Finally, the ACROA-SVM classifier is proposed to recognize the faulty roller bearing pattern. The analysis of roller bearing signals with inner-race and outer-race faults shows that the diagnostic approach based on the ACROA-SVM and using LCD to extract the energy levels of the various frequency bands as features can identify roller bearing fault patterns accurately and effectively. The proposed method is superior to approaches based on Empirical Mode Decomposition method and requires less time.

Download Full-text