KNNHI: Resilient KNN algorithm for heterogeneous incomplete data classification and K identification using rough set theory

2022 ◽  
pp. 016555152110695
Author(s):  
Ahmed Hamed ◽  
Mohamed Tahoun ◽  
Hamed Nassar

The original K-nearest neighbour ( KNN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce KNNHI, a KNN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, KNNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that KNNHI outperforms six recently published KNN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, KNNHI can also serve as a K calculator, helping KNN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by KNNHI. Finally, KNNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance.

2021 ◽  
Vol 11 (4) ◽  
pp. 307-318
Author(s):  
Robert K. Nowicki ◽  
Robert Seliga ◽  
Dariusz Żelasko ◽  
Yoichi Hayashi

Abstract The paper presents a performance analysis of a selected few rough set–based classification systems. They are hybrid solutions designed to process information with missing values. Rough set-–based classification systems combine various classification methods, such as support vector machines, k–nearest neighbour, fuzzy systems, and neural networks with the rough set theory. When all input values take the form of real numbers, and they are available, the structure of the classifier returns to a non–rough set version. The performance of the four systems has been analysed based on the classification results obtained for benchmark databases downloaded from the machine learning repository of the University of California at Irvine.


2020 ◽  
Vol 3 (2) ◽  
pp. 1-21 ◽  
Author(s):  
Haresh Sharma ◽  
◽  
Kriti Kumari ◽  
Samarjit Kar ◽  
◽  
...  

2009 ◽  
Vol 11 (2) ◽  
pp. 139-144
Author(s):  
Feng CAO ◽  
Yunyan DU ◽  
Yong GE ◽  
Deyu LI ◽  
Wei WEN

Sign in / Sign up

Export Citation Format

Share Document