SUBiNN: a stacked uni- and bivariate kNN sparse ensemble

Advances in Data Analysis and Classification ◽

10.1007/s11634-021-00462-7 ◽

2021 ◽

Author(s):

Tiffany Elsten ◽

Mark de Rooij

Keyword(s):

Random Forests ◽

Nearest Neighbor ◽

Ensemble Methods ◽

Predictive Performance ◽

Ensemble Classifier ◽

Support Vector ◽

Data Sets ◽

Vector Machines ◽

Lasso Method ◽

Nearest Neighbor Classifiers

AbstractNearest Neighbor classification is an intuitive distance-based classification method. It has, however, two drawbacks: (1) it is sensitive to the number of features, and (2) it does not give information about the importance of single features or pairs of features. In stacking, a set of base-learners is combined in one overall ensemble classifier by means of a meta-learner. In this manuscript we combine univariate and bivariate nearest neighbor classifiers that are by itself easily interpretable. Furthermore, we combine these classifiers by a Lasso method that results in a sparse ensemble of nonlinear main and pairwise interaction effects. We christened the new method SUBiNN: Stacked Uni- and Bivariate Nearest Neighbors. SUBiNN overcomes the two drawbacks of simple nearest neighbor methods. In extensive simulations and using benchmark data sets, we evaluate the predictive performance of SUBiNN and compare it to other nearest neighbor ensemble methods as well as Random Forests and Support Vector Machines. Results indicate that SUBiNN often outperforms other nearest neighbor methods, that SUBiNN is well capable of identifying noise features, but that Random Forests is often, but not always, the best classifier.

Download Full-text

Nearest Neighbor Classifiers Versus Random Forests and Support Vector Machines

2019 IEEE International Conference on Data Mining (ICDM) ◽

10.1109/icdm.2019.00164 ◽

2019 ◽

Author(s):

Saket Sathe ◽

Charu C. Aggarwal

Keyword(s):

Support Vector Machines ◽

Random Forests ◽

Nearest Neighbor ◽

Support Vector ◽

Vector Machines ◽

Nearest Neighbor Classifiers

Download Full-text

Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500425 ◽

2017 ◽

Vol 27 (07) ◽

pp. 1129-1144 ◽

Cited By ~ 1

Author(s):

Cagatay Catal ◽

Serkan Tugul ◽

Basar Akpinar

Keyword(s):

Random Forest ◽

Source Code ◽

Ensemble Methods ◽

Support Vector ◽

Data Sets ◽

Class Level ◽

Vector Machines ◽

Vote Algorithm ◽

Automatic Software ◽

Closed Source

Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.

Download Full-text

Stacked Framework for Ensemble of Heterogeneous Classification Algorithms

Journal of Circuits System and Computers ◽

10.1142/s0218126621502698 ◽

2021 ◽

pp. 2150269

Author(s):

H. Benjamin Fredrick David ◽

A. Suruliandi ◽

S. P. Raja

Keyword(s):

Nearest Neighbor ◽

Weighted Average ◽

Ensemble Methods ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Support Vector ◽

Weighted Vote ◽

Ensemble Of Classifiers ◽

Avant Garde ◽

Benchmark Datasets

Ensemble methods fabricate a sequence of classifiers for classifying fresh instances by procuring a weighted vote of their individual predictions. Toning down the error and increasing accuracy is an avant-garde problem in ensemble classification. This paper presents a novel generic object-oriented voting and weighting adapted stacking framework for utilizing an ensemble of classifiers for prediction. This universal framework operates based on the weighted average of the probabilities of any suite of base learners and the final prediction is the aggregate of their respective votes. For illustrative purposes, three familiar heterogeneous classifiers, such as the Support Vector Machine, [Formula: see text]-Nearest Neighbor and Naïve Bayes, are utilized as candidates for ensemble classification using the proposed stacked framework. Further, the ensemble classifier built upon the framework is compared with others and evaluated using various cross-validation levels and percentage splits on a range of benchmark datasets. The outcome distinguishes the framework from the competition. The proposed framework is used to predict the crime propensity of prisoners most accurately, with 99.9901% accuracy.

Download Full-text

A HYBRID SVM BASED ON NEAREST NEIGHBOR RULE

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691313500483 ◽

2013 ◽

Vol 11 (06) ◽

pp. 1350048 ◽

Cited By ~ 3

Author(s):

JIE JI ◽

QIANGFU ZHAO

Keyword(s):

Boundary Data ◽

Nearest Neighbor ◽

Support Vector ◽

Svm Classifier ◽

Data Sets ◽

Support Vectors ◽

Vector Machines ◽

Nearest Neighbor Rule ◽

Data Points ◽

Speed Up

This paper proposes a hybrid learning method to speed up the classification procedure of Support Vector Machines (SVM). Comparing most algorithms trying to decrease the support vectors in an SVM classifier, we focus on reducing the data points that need SVM for classification, and reduce the number of support vectors for each SVM classification. The system uses a Nearest Neighbor Classifier (NNC) to treat data points attentively. In the training phase, the NNC selects data near partial decision boundary, and then trains sub SVM for each Voronoi pair. For classification, most non-boundary data points are classified by NNC directly, while remaining boundary data points are passed to a corresponding local expert SVM. We also propose a data selection method for training reliable expert SVM. Experimental results on several generated and public machine learning data sets show that the proposed method significantly accelerates the testing speed.

Download Full-text

Relating HIV-1 Sequence Variation to Replication Capacity via Trees and Forests

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1031 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-18 ◽

Cited By ~ 14

Author(s):

Mark R Segal ◽

Jason D Barbour ◽

Robert M Grant

Keyword(s):

Reverse Transcriptase ◽

Random Forests ◽

Sequence Data ◽

Predictive Performance ◽

Replication Capacity ◽

Support Vector ◽

Vector Machines ◽

Regression Problems ◽

And Function ◽

Hiv 1

The problem of relating genotype (as represented by amino acid sequence) to phenotypes is distinguished from standard regression problems by the nature of sequence data. Here we investigate an instance of such a problem where the phenotype of interest is HIV-1 replication capacity and contiguous segments of protease and reverse transcriptase sequence constitutes genotype. A variety of data analytic methods have been proposed in this context. Shortcomings of select techniques are contrasted with the advantages afforded by tree-structured methods. However, tree-structured methods, in turn, have been criticized on grounds of only enjoying modest predictive performance. A number of ensemble approaches (bagging, boosting, random forests) have recently emerged, devised to overcome this deficiency. We evaluate random forests as applied in this setting, and detail why prediction gains obtained in other situations are not realized. Other approaches including logic regression, support vector machines and neural networks are also applied. We interpret results in terms of HIV-1 reverse transcriptase structure and function.

Download Full-text

Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13669 ◽

2020 ◽

Vol 8 (4) ◽

pp. 297-303

Author(s):

Tamunopriye Ene Dagogo-George ◽

Hammed Adeleye Mojeed ◽

Abdulateef Oluwagbemiga Balogun ◽

Modinat Abolore Mabayoje ◽

Shakirat Aderonke Salihu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diabetic Retinopathy ◽

Ensemble Methods ◽

Predictive Performance ◽

Ensemble Classification ◽

Support Vector ◽

Learning Approaches ◽

Vector Machines ◽

Homogeneous Ensemble

Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.

Download Full-text

Nondegenerate Piecewise Linear Systems: A Finite Newton Algorithm and Applications in Machine Learning

Neural Computation ◽

10.1162/neco_a_00241 ◽

2012 ◽

Vol 24 (4) ◽

pp. 1047-1084 ◽

Cited By ~ 2

Author(s):

Xiao-Tong Yuan ◽

Shuicheng Yan

Keyword(s):

Linear Systems ◽

Optimization Problems ◽

Piecewise Linear ◽

Optimization Methods ◽

Coefficient Matrix ◽

Learning Problems ◽

Support Vector ◽

Data Sets ◽

Piecewise Linear Systems ◽

Vector Machines

We investigate Newton-type optimization methods for solving piecewise linear systems (PLSs) with nondegenerate coefficient matrix. Such systems arise, for example, from the numerical solution of linear complementarity problem, which is useful to model several learning and optimization problems. In this letter, we propose an effective damped Newton method, PLS-DN, to find the exact (up to machine precision) solution of nondegenerate PLSs. PLS-DN exhibits provable semiiterative property, that is, the algorithm converges globally to the exact solution in a finite number of iterations. The rate of convergence is shown to be at least linear before termination. We emphasize the applications of our method in modeling, from a novel perspective of PLSs, some statistical learning problems such as box-constrained least squares, elitist Lasso (Kowalski & Torreesani, 2008 ), and support vector machines (Cortes & Vapnik, 1995 ). Numerical results on synthetic and benchmark data sets are presented to demonstrate the effectiveness and efficiency of PLS-DN on these problems.

Download Full-text

Detecting COVID-19 from Audio Recording of Coughs Using Random Forests and Support Vector Machines

10.21437/interspeech.2021-2191 ◽

2021 ◽

Author(s):

Isabella Södergren ◽

Maryam Pahlavan Nodeh ◽

Prakash Chandra Chhipa ◽

Konstantina Nikolaidou ◽

György Kovács

Keyword(s):

Support Vector Machines ◽

Random Forests ◽

Support Vector ◽

Audio Recording ◽

Vector Machines

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Adaptive Nearest Neighbor Classification using Support Vector Machines

Advances in Neural Information Processing Systems 14 ◽

10.7551/mitpress/1120.003.0090 ◽

2002 ◽

Keyword(s):

Support Vector Machines ◽

Nearest Neighbor ◽

Support Vector ◽

Nearest Neighbor Classification ◽

Vector Machines ◽

Neighbor Classification

Download Full-text