Combination of Multiple Nearest Neighbor Classifiers Based on Feature Subset Clustering Method

Author(s):  
Li-Juan Wang ◽  
Qiang Hua ◽  
Xiao-Long Wang ◽  
Qing-Cai Chen
2008 ◽  
Vol 04 (01) ◽  
pp. 107-122 ◽  
Author(s):  
YANNIS MARINAKIS ◽  
MAGDALENE MARINAKI ◽  
CONSTANTIN ZOPOUNIDIS

This paper presents a novel approach to solve feature subset selection problems using an Ant Colony Optimization (ACO) algorithm. ACO is one of the important naturally inspired intelligent techniques. It is based on the foraging behavior of real ants in nature. The proposed ACO is combined with a number of nearest neighbor classifiers. The resulting ACO algorithm is applied to classify credit risk using data belonging to 1,411 firms obtained from a leading Greek commercial bank. The objective is to classify subject firms into several groups representing different levels of credit risk. The results of the proposed algorithm are compared with those of others including SVM, CART, and with two other metaheuristic algorithms using tabu search and genetic algorithms, both of which use nearest neighbor classifiers in the classification phase. The results suggest that the proposed method is more accurate than others that have been tested in classifying credit risk.


2021 ◽  
Vol 10 (4) ◽  
pp. 246
Author(s):  
Vagan Terziyan ◽  
Anton Nikulin

Operating with ignorance is an important concern of geographical information science when the objective is to discover knowledge from the imperfect spatial data. Data mining (driven by knowledge discovery tools) is about processing available (observed, known, and understood) samples of data aiming to build a model (e.g., a classifier) to handle data samples that are not yet observed, known, or understood. These tools traditionally take semantically labeled samples of the available data (known facts) as an input for learning. We want to challenge the indispensability of this approach, and we suggest considering the things the other way around. What if the task would be as follows: how to build a model based on the semantics of our ignorance, i.e., by processing the shape of “voids” within the available data space? Can we improve traditional classification by also modeling the ignorance? In this paper, we provide some algorithms for the discovery and visualization of the ignorance zones in two-dimensional data spaces and design two ignorance-aware smart prototype selection techniques (incremental and adversarial) to improve the performance of the nearest neighbor classifiers. We present experiments with artificial and real datasets to test the concept of the usefulness of ignorance semantics discovery.


2018 ◽  
Vol 74 ◽  
pp. 1-14 ◽  
Author(s):  
Yikun Qin ◽  
Zhu Liang Yu ◽  
Chang-Dong Wang ◽  
Zhenghui Gu ◽  
Yuanqing Li

2011 ◽  
Vol 74 (4) ◽  
pp. 656-660 ◽  
Author(s):  
Qinghua Hu ◽  
Pengfei Zhu ◽  
Yongbin Yang ◽  
Daren Yu

2020 ◽  
Author(s):  
Qi Zhang ◽  
Shan Li ◽  
Bin Yu ◽  
Yang Li ◽  
Yandan Zhang ◽  
...  

ABSTRACTProteins play a significant part in life processes such as cell growth, development, and reproduction. Exploring protein subcellular localization (SCL) is a direct way to better understand the function of proteins in cells. Studies have found that more and more proteins belong to multiple subcellular locations, and these proteins are called multi-label proteins. They not only play a key role in cell life activities, but also play an indispensable role in medicine and drug development. This article first presents a new prediction model, MpsLDA-ProSVM, to predict the SCL of multi-label proteins. Firstly, the physical and chemical information, evolution information, sequence information and annotation information of protein sequences are fused. Then, for the first time, use a weighted multi-label linear discriminant analysis framework based on entropy weight form (wMLDAe) to refine and purify features, reduce the difficulty of learning. Finally, input the optimal feature subset into the multi-label learning with label-specific features (LIFT) and multi-label k-nearest neighbor (ML-KNN) algorithms to obtain a synthetic ranking of relevant labels, and then use Prediction and Relevance Ordering based SVM (ProSVM) classifier to predict the SCLs. This method can rank and classify related tags at the same time, which greatly improves the efficiency of the model. Tested by jackknife method, the overall actual accuracy (OAA) on virus, plant, Gram-positive bacteria and Gram-negative bacteria datasets are 98.06%, 98.97%, 99.81% and 98.49%, which are 0.56%-9.16%, 5.37%-30.87%, 3.51%-6.91% and 3.99%-8.59% higher than other advanced methods respectively. The source codes and datasets are available at https://github.com/QUST-AIBBDRC/MpsLDA-ProSVM/.


Sign in / Sign up

Export Citation Format

Share Document