Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.

Download Full-text

A Hybrid Sampling SVM Approach to Imbalanced Data Classification

Abstract and Applied Analysis ◽

10.1155/2014/972786 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Qiang Wang

Keyword(s):

Real World ◽

Imbalanced Data ◽

Data Classification ◽

Classification Problem ◽

Experimental Results ◽

Training Dataset ◽

Imbalanced Data Classification ◽

Real World Datasets ◽

Classification Information ◽

Hybrid Sampling

Imbalanced datasets are frequently found in many real applications. Resampling is one of the effective solutions due to generating a relatively balanced class distribution. In this paper, a hybrid sampling SVM approach is proposed combining an oversampling technique and an undersampling technique for addressing the imbalanced data classification problem. The proposed approach first uses an undersampling technique to delete some samples of the majority class with less classification information and then applies an oversampling technique to gradually create some new positive samples. Thus, a balanced training dataset is generated to replace the original imbalanced training dataset. Finally, through experimental results on the real-world datasets, our proposed approach has the ability to identify informative samples and deal with the imbalanced data classification problem.

Download Full-text

Data classification based on the hybrid intellectual technology

ITM Web of Conferences ◽

10.1051/itmconf/20181804001 ◽

2018 ◽

Vol 18 ◽

pp. 04001 ◽

Cited By ~ 1

Author(s):

Liliya Demidova ◽

Maksim Egin

Keyword(s):

Genetic Algorithm ◽

Data Classification ◽

Experimental Results ◽

Svm Classifier ◽

Optimal Parameters ◽

Classification Technique ◽

Consistent Application

In this paper the data classification technique, implying the consistent application of the SVM and Parzen classifiers, has been suggested. The Parser classifier applies to data which can be both correctly and erroneously classified using the SVM classifier, and are located in the experimentally defined subareas near the hyperplane which separates the classes. A herewith, the SVM classifier is used with the default parameters values, and the optimal parameters values of the Parser classifier are determined using the genetic algorithm. The experimental results confirming the effectiveness of the proposed hybrid intellectual data classification technology have been presented.

Download Full-text

A mixed similarity measure based on rough sets theory (MSM-R) and some experimental results for data classification problem.

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/28/2/2497 ◽

2012 ◽

Vol 28 (2) ◽

Author(s):

Nguyễn Trung Tuấn

Keyword(s):

Similarity Measure ◽

Rough Sets ◽

Data Classification ◽

Classification Problem ◽

Experimental Results ◽

Rough Sets Theory ◽

Sets Theory

Download Full-text

Multi-label classification using error correcting output codes

International Journal of Applied Mathematics and Computer Science ◽

10.2478/v10006-012-0061-2 ◽

2012 ◽

Vol 22 (4) ◽

pp. 829-840 ◽

Cited By ~ 9

Author(s):

Tomasz Kajdanowicz ◽

Przemysław Kazienko

Keyword(s):

Classification Problem ◽

Experimental Results ◽

Bch Code ◽

Classification Algorithms ◽

Noisy Channel ◽

Coding Scheme ◽

Binary Relevance ◽

Power Set ◽

Error Correcting Output Codes ◽

Classification Quality

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.

Download Full-text

Detection of Liver Disorder Using RBF SVM in Comparison with Naïve Bayes to Measure the Accuracy, Precision, Sensitivity and Specificity

Alinteri Journal of Agricultural Sciences ◽

10.47059/alinteri/v36i1/ajas21093 ◽

2021 ◽

Vol 36 (1) ◽

pp. 657-664

Author(s):

M.S. Madhu ◽

Dr. Kirupa Ganapathy

Keyword(s):

Sensitivity And Specificity ◽

Naive Bayes ◽

Liver Disorder ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Training Dataset ◽

Svm Classifier ◽

Specificity And Sensitivity ◽

Svm Algorithm ◽

Bayes Algorithm

Aim: Machine learning techniques are rapidly used in the area of medical research due to its impressive results in diagnosis and prediction of diseases. The objective of this study is to evaluate the performance of SVM classifier in identification of liver disorder by comparing it with Naive Bayes algorithm. Methods and Materials: A total of 31619 samples are collected from three liver disease datasets available in kaggle. These samples are divided into training dataset (n = 22133 [70%]) and test dataset (n = 9486 [30%]). Accuracy, precision, specificity and sensitivity values are calculated to quantify the performance of the SVM algorithm. Results: SVM achieved accuracy, precision, sensitivity and specificity of 73.64%, 97.82%, 97.56% and 69.77% respectively compared to 57.31%, 41.39%, 94.87% and 37.20% by Naive Bayes algorithm. Conclusion: In this study it is found that the RBF SVM algorithm performed better than the Naive Bayes algorithm in liver disorder detection of the datasets considered.

Download Full-text

A Classification Method of Network Ideological and Political Resources Using Improved SVM Algorithm

Security and Communication Networks ◽

10.1155/2021/2133042 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

WenXia Wang

Keyword(s):

Classification Problem ◽

Fuzzy Membership ◽

Fuzzy Membership Function ◽

Svm Classifier ◽

Particle Swarm Algorithm ◽

Planning Problem ◽

Political Resources ◽

Svm Algorithm ◽

Simulation Results

In order to improve the accuracy and efficiency of the classification of network ideological and political resources and promote the efficiency of ideological education, a research on the classification of network ideological and political resources based on the improved SVM algorithm is proposed. We analyze the characteristics and current situation of network ideological and political resources and conclude that the method elements are open and technical. The ontology elements are rich and shared, and the behavioral elements are autonomous and interactive. Three types of network ideological and political resources are proposed: the main resource, content resource, and means resource. The particle swarm algorithm is used to improve the SVM algorithm. In the process of constructing the SVM classifier, the fuzzy membership function is introduced, the classification problem of network ideological and political resources is converted into a secondary planning problem, and the accuracy of network ideological and political resources is finally realized. Simulation results show that the use of improved algorithms to classify network ideological and political resources can improve the accuracy and efficiency of network abnormal data classification.

Download Full-text

TWO-STAGE DATA CLASSIFICATION METHOD BASED ON SVM-ALGORITHM AND THE k NEAREST NEIGHBORS ALGORITHM

Vestnik of Ryazan State Radio Engineering University ◽

10.21667/1995-4565-2017-62-4-119-132 ◽

2017 ◽

Vol 62 ◽

pp. 119-132

Author(s):

L. A. Demidova ◽

◽

Yu. S. Sokolova ◽

Keyword(s):

Data Classification ◽

Nearest Neighbors ◽

Classification Method ◽

Two Stage ◽

K Nearest Neighbors ◽

Svm Algorithm

Download Full-text

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

BioMed Research International ◽

10.1155/2016/4783801 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 13

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Yu-Hong Fang ◽

Yu-Jun Zhao ◽

...

Keyword(s):

Protein Sequences ◽

Relevance Vector Machine ◽

Experimental Results ◽

Computational Method ◽

Support Vector ◽

Svm Classifier ◽

Local Phase ◽

Local Phase Quantization ◽

Phase Quantization ◽

Better Than

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Download Full-text

Remote Sensing Data Classification Using A Hybrid Pre-Trained VGG16 CNN- SVM Classifier

2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus) ◽

10.1109/elconrus51938.2021.9396706 ◽

2021 ◽

Author(s):

Nyan Linn Tun ◽

Alexander Gavrilov ◽

Naing Min Tun ◽

Do Minh Trieu ◽

Htet Aung

Keyword(s):

Remote Sensing ◽

Remote Sensing Data ◽

Data Classification ◽

Svm Classifier ◽

Sensing Data

Download Full-text

Local Graph Edge Partitioning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466685 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-25

Author(s):

Shengwei Ji ◽

Chenyang Bu ◽

Lei Li ◽

Xindong Wu

Keyword(s):

Real World ◽

Graph Partitioning ◽

Large Scale ◽

Complete Information ◽

Local Information ◽

Experimental Results ◽

Two Stage ◽

Graph Computation ◽

Local Graph ◽

Edge Partitioning

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.

Download Full-text