A HYBRID SVM BASED ON NEAREST NEIGHBOR RULE

Author(s):  
JIE JI ◽  
QIANGFU ZHAO

This paper proposes a hybrid learning method to speed up the classification procedure of Support Vector Machines (SVM). Comparing most algorithms trying to decrease the support vectors in an SVM classifier, we focus on reducing the data points that need SVM for classification, and reduce the number of support vectors for each SVM classification. The system uses a Nearest Neighbor Classifier (NNC) to treat data points attentively. In the training phase, the NNC selects data near partial decision boundary, and then trains sub SVM for each Voronoi pair. For classification, most non-boundary data points are classified by NNC directly, while remaining boundary data points are passed to a corresponding local expert SVM. We also propose a data selection method for training reliable expert SVM. Experimental results on several generated and public machine learning data sets show that the proposed method significantly accelerates the testing speed.

2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


2015 ◽  
Vol 11 (1) ◽  
pp. 25 ◽  
Author(s):  
Padmavathi Janardhanan ◽  
Heena L. ◽  
Fathima Sabika

The idea of medical data mining is to extract hidden knowledge in medical field using data mining techniques. One of the positive aspects is to discover the important patterns. It is possible to identify patterns even if we do not have fully understood the casual mechanisms behind those patterns. In this case, data mining prepares the ability of research and discovery that may not have been evident. This paper analyzes the effectiveness of SVM, the most popular classification techniques in classifying medical datasets. This paper analyses the performance of the Naïve Bayes classifier, RBF network and SVM Classifier. The performance of predictive model is analysed with different medical datasets in predicting diseases is recorded and compared. The datasets were of binary class and each dataset had different number of attributes. The datasets include heart datasets, cancer and diabetes datasets. It is observed that SVM classifier produces better percentage of accuracy in classification. The work has been implemented in WEKA environment and obtained results show that SVM is the most robust and effective classifier for medical data sets.


Author(s):  
Diana Benavides-Prado ◽  
Yun Sing Koh ◽  
Patricia Riddle

In our research, we consider transfer learning scenarios where a target learner does not have access to the source data, but instead to hypotheses or models induced from it. This is called the Hypothesis Transfer Learning (HTL) problem. Previous approaches concentrated on transferring source hypotheses as a whole. We introduce a novel method for selectively transferring elements from previous hypotheses learned with Support Vector Machines. The representation of an SVM hypothesis as a set of support vectors allows us to treat this information as privileged to aid learning during a new task. Given a possibly large number of source hypotheses, our approach selects the source support vectors that more closely resemble the target data, and transfers their learned coefficients as constraints on the coefficients to be learned. This strategy increases the importance of relevant target data points based on their similarity to source support vectors, while learning from the target data. Our method shows important improvements on the convergence rate on three classification datasets of varying sizes, decreasing the number of iterations by up to 56% on average compared to learning with no transfer and up to 92% compared to regular HTL, while maintaining similar accuracy levels.


Author(s):  
Tiffany Elsten ◽  
Mark de Rooij

AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.


2017 ◽  
Vol 28 (02) ◽  
pp. 1750015 ◽  
Author(s):  
M. Andrecut

The least-squares support vector machine (LS-SVM) is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the LS-SVM classifier. The proposed methods are based on randomized block kernel matrices, and we show that they provide good accuracy and reliable scaling for multi-class classification problems with relatively large data sets. Also, we present several numerical experiments that illustrate the practical applicability of the proposed methods.


2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


2012 ◽  
Vol 24 (4) ◽  
pp. 1047-1084 ◽  
Author(s):  
Xiao-Tong Yuan ◽  
Shuicheng Yan

We investigate Newton-type optimization methods for solving piecewise linear systems (PLSs) with nondegenerate coefficient matrix. Such systems arise, for example, from the numerical solution of linear complementarity problem, which is useful to model several learning and optimization problems. In this letter, we propose an effective damped Newton method, PLS-DN, to find the exact (up to machine precision) solution of nondegenerate PLSs. PLS-DN exhibits provable semiiterative property, that is, the algorithm converges globally to the exact solution in a finite number of iterations. The rate of convergence is shown to be at least linear before termination. We emphasize the applications of our method in modeling, from a novel perspective of PLSs, some statistical learning problems such as box-constrained least squares, elitist Lasso (Kowalski & Torreesani, 2008 ), and support vector machines (Cortes & Vapnik, 1995 ). Numerical results on synthetic and benchmark data sets are presented to demonstrate the effectiveness and efficiency of PLS-DN on these problems.


Sign in / Sign up

Export Citation Format

Share Document