scholarly journals Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 615
Author(s):  
Liliya A. Demidova

The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Qiang Wang

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.


2018 ◽  
Vol 18 ◽  
pp. 04001 ◽  
Author(s):  
Liliya Demidova ◽  
Maksim Egin

In this paper the data classification technique, implying the consistent application of the SVM and Parzen classifiers, has been suggested. The Parser classifier applies to data which can be both correctly and erroneously classified using the SVM classifier, and are located in the experimentally defined subareas near the hyperplane which separates the classes. A herewith, the SVM classifier is used with the default parameters values, and the optimal parameters values of the Parser classifier are determined using the genetic algorithm. The experimental results confirming the effectiveness of the proposed hybrid intellectual data classification technology have been presented.


Author(s):  
Tomasz Kajdanowicz ◽  
Przemysław Kazienko

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.


2021 ◽  
Vol 36 (1) ◽  
pp. 657-664
Author(s):  
M.S. Madhu ◽  
Dr. Kirupa Ganapathy

Aim: Machine learning techniques are rapidly used in the area of medical research due to its impressive results in diagnosis and prediction of diseases. The objective of this study is to evaluate the performance of SVM classifier in identification of liver disorder by comparing it with Naive Bayes algorithm. Methods and Materials: A total of 31619 samples are collected from three liver disease datasets available in kaggle. These samples are divided into training dataset (n = 22133 [70%]) and test dataset (n = 9486 [30%]). Accuracy, precision, specificity and sensitivity values are calculated to quantify the performance of the SVM algorithm. Results: SVM achieved accuracy, precision, sensitivity and specificity of 73.64%, 97.82%, 97.56% and 69.77% respectively compared to 57.31%, 41.39%, 94.87% and 37.20% by Naive Bayes algorithm. Conclusion: In this study it is found that the RBF SVM algorithm performed better than the Naive Bayes algorithm in liver disorder detection of the datasets considered.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
WenXia Wang

In order to improve the accuracy and efficiency of the classification of network ideological and political resources and promote the efficiency of ideological education, a research on the classification of network ideological and political resources based on the improved SVM algorithm is proposed. We analyze the characteristics and current situation of network ideological and political resources and conclude that the method elements are open and technical. The ontology elements are rich and shared, and the behavioral elements are autonomous and interactive. Three types of network ideological and political resources are proposed: the main resource, content resource, and means resource. The particle swarm algorithm is used to improve the SVM algorithm. In the process of constructing the SVM classifier, the fuzzy membership function is introduced, the classification problem of network ideological and political resources is converted into a secondary planning problem, and the accuracy of network ideological and political resources is finally realized. Simulation results show that the use of improved algorithms to classify network ideological and political resources can improve the accuracy and efficiency of network abnormal data classification.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Ji-Yong An ◽  
Fan-Rong Meng ◽  
Zhu-Hong You ◽  
Yu-Hong Fang ◽  
Yu-Jun Zhao ◽  
...  

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-25
Author(s):  
Shengwei Ji ◽  
Chenyang Bu ◽  
Lei Li ◽  
Xindong Wu

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.


Sign in / Sign up

Export Citation Format

Share Document