Classification of Multivariate Data With Missing Values Using Expected Discriminant Scores

Author(s):  
Wolfgang Kossa
Author(s):  
Parnika N. Paranjape ◽  
Meera M. Dhabu ◽  
Parag S. Deshpande

Applications like customer identification from their peculiar purchase patterns require class-wise discriminative feature subsets called as class signatures for classification. If the classifiers like KNN, SVM, etc. which require to work with a complete feature set, are applied to such applications, then the entire feature set may introduce errors in the classification. Decision tree classifier generates class-wise prominent feature subsets and hence, can be employed for such applications. However, all of these classifiers fail to model the relationship between features present in vector data. Thus, we propose to model the features and their interrelationships as graphs. Graphs occur naturally in protein molecules, chemical compounds, etc. for which several graph classifiers exist. However, multivariate data do not exhibit the graphs naturally. Thus, the proposed work focuses on (1) modeling multivariate data as graphs and (2) obtaining class-wise prominent subgraph signatures which are then used to train classifiers like SVM for decision making. The proposed method dSubSign can also classify multivariate data with missing values without performing imputation or case deletion. The performance analysis of both real-world and synthetic datasets shows that the accuracy of dSubSign is either higher or comparable to other existing methods.


2013 ◽  
Vol 44 (9) ◽  
pp. 1299-1305 ◽  
Author(s):  
Giovanna Piantanida ◽  
Eva Menart ◽  
Marina Bicchieri ◽  
Matija Strlič

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Fei Yang ◽  
Jiazhi Du ◽  
Jiying Lang ◽  
Weigang Lu ◽  
Lei Liu ◽  
...  

Electrocardiogram (ECG) signal is critical to the classification of cardiac arrhythmia using some machine learning methods. In practice, the ECG datasets are usually with multiple missing values due to faults or distortion. Unfortunately, many established algorithms for classification require a fully complete matrix as input. Thus it is necessary to impute the missing data to increase the effectiveness of classification for datasets with a few missing values. In this paper, we compare the main methods for estimating the missing values in electrocardiogram data, e.g., the “Zero method”, “Mean method”, “PCA-based method”, and “RPCA-based method” and then propose a novel KNN-based classification algorithm, i.e., a modified kernel Difference-Weighted KNN classifier (MKDF-WKNN), which is fit for the classification of imbalance datasets. The experimental results on the UCI database indicate that the “RPCA-based method” can successfully handle missing values in arrhythmia dataset no matter how many values in it are missing and our proposed classification algorithm, MKDF-WKNN, is superior to other state-of-the-art algorithms like KNN, DS-WKNN, DF-WKNN, and KDF-WKNN for uneven datasets which impacts the accuracy of classification.


Sign in / Sign up

Export Citation Format

Share Document