scholarly journals EEkNN: k-Nearest Neighbor Classifier with an Evidential Editing Procedure for Training Samples

Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 592
Author(s):  
Lianmeng Jiao ◽  
Xiaojiao Geng ◽  
Quan Pan

The k-nearest neighbor (kNN) rule is one of the most popular classification algorithms applied in many fields because it is very simple to understand and easy to design. However, one of the major problems encountered in using the kNN rule is that all of the training samples are considered equally important in the assignment of the class label to the query pattern. In this paper, an evidential editing version of the kNN rule is developed within the framework of belief function theory. The proposal is composed of two procedures. An evidential editing procedure is first proposed to reassign the original training samples with new labels represented by an evidential membership structure, which provides a general representation model regarding the class membership of the training samples. After editing, a classification procedure specifically designed for evidently edited training samples is developed in the belief function framework to handle the more general situation in which the edited training samples are assigned dependent evidential labels. Three synthetic datasets and six real datasets collected from various fields were used to evaluate the performance of the proposed method. The reported results show that the proposal achieves better performance than other considered kNN-based methods, especially for datasets with high imprecision ratios.

10.29007/5gzr ◽  
2018 ◽  
Author(s):  
Cezary Kaliszyk ◽  
Josef Urban

Two complementary AI methods are used to improve the strength of the AI/ATP service for proving conjectures over the HOL Light and Flyspeck corpora. First, several schemes for frequency-based feature weighting are explored in combination with distance-weighted k-nearest-neighbor classifier. This results in 16% improvement (39.0% to 45.5% Flyspeck problems solved) of the overall strength of the service when using 14 CPUs and 30 seconds. The best premise-selection/ATP combination is improved from 24.2% to 31.4%, i.e. by 30%. A smaller improvement is obtained by evolving targetted E prover strategies on two particular premise selections, using the Blind Strategymaker (BliStr) system. This raises the performance of the best AI/ATP method from 31.4% to 34.9%, i.e. by 11%, and raises the current 14-CPU power of the service to 46.9%.


2020 ◽  
Author(s):  
Daniel B Hier ◽  
Jonathan Kopel ◽  
Steven U Brint ◽  
Donald C Wunsch II ◽  
Gayla R Olbricht ◽  
...  

Abstract Objective: Neurologists lack a metric for measuring the distance between neurological patients. When neurological signs and symptoms are represented as neurological concepts from a hierarchical ontology and neurological patients are represented as sets of concepts, distances between patients can be represented as inter-set distances.Methods:We converted the neurological signs and symptoms from 721 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated inter-concept distances based a hierarchical ontology and we calculated inter-patient distances by semantic weighted bipartite matching. We evaluated the accuracy of a k-nearest neighbor classifier to allocate patients into 40 diagnostic classes.Results:Within a given diagnosis, mean patient distance differed by diagnosis, suggesting that across diagnoses there are differences in how similar patients are to other patients with the same diagnosis. The mean distance from one diagnosis to another diagnosis differed by diagnosis, suggesting that diagnoses differ in their proximity to other diagnoses. Utilizing a k-nearest neighbor classifier and inter-patient distances, the risk of misclassification differed by diagnosis.Conclusion:If signs and symptoms are converted to machine-readable codes and patients are represented as sets of these codes, patient distances can be computed as an inter-set distance. These patient distances given insights into how homogeneous patients are within a diagnosis (stereotypy), the distance between different diagnoses (proximity), and the risk of diagnosis misclassification (diagnostic error).


Sign in / Sign up

Export Citation Format

Share Document