EVOLVING EDITED k-NEAREST NEIGHBOR CLASSIFIERS

2008 ◽  
Vol 18 (06) ◽  
pp. 459-467 ◽  
Author(s):  
ROBERTO GIL-PITA ◽  
XIN YAO

The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Lijun Hao ◽  
Min Zhang ◽  
Gang Huang

Feature optimization, which is the theme of this paper, is actually the selective selection of the variables on the input side at the time of making a predictive kind of model. However, an improved feature optimization algorithm for breath signal based on the Pearson-BPSO was proposed and applied to distinguish hepatocellular carcinoma by electronic nose (eNose) in the paper. First, the multidimensional features of the breath curves of hepatocellular carcinoma patients and healthy controls in the training samples were extracted; then, the features with less relevance to the classification were removed according to the Pearson correlation coefficient; next, the fitness function was constructed based on K-Nearest Neighbor (KNN) classification error and feature dimension, and the feature optimization transformation matrix was obtained based on BPSO. Furthermore, the transformation matrix was applied to optimize the test sample’s features. Finally, the performance of the optimization algorithm was evaluated by the classifier. The experiment results have shown that the Pearson-BPSO algorithm could effectively improve the classification performance compared with BPSO and PCA optimization methods. The accuracy of SVM and RF classifier was 86.03% and 90%, respectively, and the sensitivity and specificity were about 90% and 80%. Consequently, the application of Pearson-BPSO feature optimization algorithm will help improve the accuracy of hepatocellular carcinoma detection by eNose and promote the clinical application of intelligent detection.


2020 ◽  
Vol 2020 ◽  
pp. 1-18 ◽  
Author(s):  
Zhou Tao ◽  
Lu Huiling ◽  
Fuyuan Hu ◽  
Shi Qiu ◽  
Wu Cuiying

Aiming at the shortcomings of high feature reduction using traditional rough sets, such as insensitivity with noise data and easy loss of potentially useful information, combining with genetic algorithm, in this paper, a VPRS-GA (Variable Precision Rough Set--Genetic Algorithm) model for high-dimensional feature reduction of medical image is proposed. Firstly, rigid inclusion of the lower approximation is extended to partial inclusion by classification error rate β in the traditional rough set model, and the ability dealing with noise data is improved. Secondly, some factors of feature reduction are considered, such as attribute dependency, attributes reduction length, and gene coding weight. A general framework of fitness function is put forward, and different fitness functions are constructed by using different factors such as weight and classification error rate β. Finally, 98 dimensional features of PET/CT lung tumor ROI are extracted to build decision information table of lung tumor patients. Three kinds of experiments in high-dimensional feature reduction are carried out, using support vector machine to verify the influence of recognition accuracy in different fitness function parameters and classification error rate. Experimental results show that classification accuracy is affected deeply by different weight values under the invariable classification error rate condition and by increasing classification error rate under the invariable weigh value condition. Hence, in order to achieve better recognition accuracy, different problems use suitable parameter combination.


Author(s):  
S. Vijaya Rani ◽  
G. N. K. Suresh Babu

The illegal hackers  penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods  available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.


Author(s):  
Amit Saxena ◽  
John Wang

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.


2010 ◽  
Vol 5 (2) ◽  
pp. 133-137 ◽  
Author(s):  
Mohammed J. Islam ◽  
Q. M. Jonathan Wu ◽  
Majid Ahmadi ◽  
Maher A. SidAhmed

2019 ◽  
Vol 108 (12) ◽  
pp. 2087-2111 ◽  
Author(s):  
Eric Bax ◽  
Lingjie Weng ◽  
Xu Tian

Sign in / Sign up

Export Citation Format

Share Document