Evidential classification of incomplete instance based on K-nearest centroid neighbor

2021 ◽  
pp. 1-16
Author(s):  
Zong-fang Ma ◽  
Zhe Liu ◽  
Chan Luo ◽  
Lin Song

Classification of incomplete instance is a challenging problem due to the missing features generally cause uncertainty in the classification result. A new evidential classification method of incomplete instance based on adaptive imputation thanks to the framework of evidence theory. Specifically, the missing values of different incomplete instances in test set are adaptively estimated based on Shannon entropy and K-nearest centroid neighbors (KNCNs) technology. The single or multiple edited instances (with estimations) then are classified by the chosen classifier to get single or multiple classification results for the instances with different discounting (weighting) factors, and a new adaptive global fusion method finally is proposed to unify the different discounted results. The proposed method can well capture the imprecision degree of classification by submitting the instances that are difficult to be classified into a specific class to associate the meta-class and effectively reduce the classification error rates. The effectiveness and robustness of the proposed method has been tested through four experiments with artificial and real datasets.

2014 ◽  
Vol 7 (1) ◽  
pp. 78-83 ◽  
Author(s):  
Jiatang Cheng ◽  
Li Ai ◽  
Zhimei Duan ◽  
Yan Xiong

Aiming at the problem of the conventional vibration fault diagnosis technology with inconsistent result of a hydroelectric generating unit, an information fusion method was proposed based on the improved evidence theory. In this algorithm, the original evidence was amended by the credibility factor, and then the synthesis rule of standard evidence theory was utilized to carry out information fusion. The results show that the proposed method can obtain any definitive conclusion even if there is high conflict evidence in the synthesis evidence process, and may avoid the divergent phenomenon when the consistent evidence is fused, and is suitable for the fault classification of hydroelectric generating unit.


2018 ◽  
Author(s):  
Mark A. Eckert ◽  
Kenneth I. Vaden ◽  
Mulugeta Gebregziabher ◽  

AbstractChildren with reading disability exhibit varied deficits in reading and cognitive abilities that contribute to their reading comprehension problems. Some children exhibit primary deficits in phonological processing, while others can exhibit deficits in oral language and executive functions that affect comprehension. This behavioral heterogeneity is problematic when missing data prevent the characterization of different reading profiles, which often occurs in retrospective data sharing initiatives without coordinated data collection. Here we show that reading profiles can be reliably identified based on Random Forest classification of incomplete behavioral datasets, after the missForest method is used to multiply impute missing values. Results from simulation analyses showed that reading profiles could be accurately classified across degrees of missingness (e.g., ~5% classification error for 30% missingness across the sample). The application of missForest to a real multi-site dataset (n = 924) showed that reading disability profiles significantly and consistently differed in reading and cognitive abilities for cases with and without missing data. The results of validation analyses indicated that the reading profiles (cases with and without missing data) exhibited significant differences for an independent set of behavioral variables that were not used to classify reading profiles. Together, the results show how multiple imputation can be applied to the classification of cases with missing data and can increase the integrity of results from multi-site open access datasets.


Author(s):  
Santosh Shrestha ◽  
Lise Deleuran ◽  
René Gislum

The feasibility of rapid and non-destructive classification of five different tomato seed cultivars was investigated by using visible and short-wave near infrared (Vis-NIR) spectra combined with chemometric approaches. Vis-NIR spectra containing 19 different wavelengths ranging from 375 nm to 970 nm were extracted from multispectral images of tomato seeds. Principal component analysis (PCA) was used for data exploration, while partial least squares discriminant analysis (PLS-DA) and support vector machine discriminant analysis (SVM-DA) were used to classify the five different tomato cultivars. The results showed very good classification accuracy for two independent test sets ranging from 94% to 100% for all tomato cultivars irrespective of chemometric methods. The overall classification error rates were 3.2% and 0.4% for the PLS-DA and SVM-DA calibration models, respectively. The results indicate that Vis-NIR spectra have the potential to be used for non-destructive discrimination of tomato seed cultivars with an opportunity to integrate them into plant genetic resource management, plant variety protection or registration programmes.


2008 ◽  
Vol 20 (06) ◽  
pp. 345-352
Author(s):  
Li-Yeh Chuang ◽  
Cheng-San Yang ◽  
Jung-Chike Li ◽  
Cheng-Hong Yang

Microarray data can provide valuable results for a variety of gene expression profile problems and contribute to advances in clinical medicine. The application of microarray data on cancer-type classification has recently gained in popularity. The properties of microarray data contain a large number of features (genes) with high dimensions, and one in the multi-class category. These facts make testing and training of general classification methods difficult. Reducing the number of genes and achieving lower classification error rates are the main issues to be solved. The classification of microarray data samples can be regarded as a feature selection and classifier design problem. The goal of feature selection is to select those subsets of differentially expressed genes that are potentially relevant for distinguishing the sample classes. Classical genetic algorithms (GAs) may suffer from premature convergence and thus lead to poor experimental results. In this paper, combat genetic algorithm (CGA) is used to implement the feature selection, and a K-nearest neighbor with the leave-one-out cross-validation method serves as a classifier of the CGA fitness function for the classification problem. The proposed method was applied to 10 microarray data sets that were obtained from the literature. The experimental results show that the proposed method not only effectively reduced the number of gene expression levels but also achieved lower classification error rates.


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Yashodhan Athavale ◽  
Sridhar Krishnan ◽  
Aziz Guergachi

The intention of this study is to gauge the performance of Fisher kernels for dimension simplification and classification of time-series signals. Our research work has indicated that Fisher kernels have shown substantial improvement in signal classification by enabling clearer pattern visualization in three-dimensional space. In this paper, we will exhibit the performance of Fisher kernels for two domains: financial and biomedical. The financial domain study involves identifying the possibility of collapse or survival of a company trading in the stock market. For assessing the fate of each company, we have collected financial time-series composed of weekly closing stock prices in a common time frame, using Thomson Datastream software. The biomedical domain study involves knee signals collected using the vibration arthrometry technique. This study uses the severity of cartilage degeneration for classifying normal and abnormal knee joints. In both studies, we apply Fisher Kernels incorporated with a Gaussian mixture model (GMM) for dimension transformation into feature space, which is created as a three-dimensional plot for visualization and for further classification using support vector machines. From our experiments we observe that Fisher Kernel usage fits really well for both kinds of signals, with low classification error rates.


2008 ◽  
Vol 27 (22) ◽  
pp. 4515-4531 ◽  
Author(s):  
Alexander Brenning ◽  
Berthold Lausen
Keyword(s):  

Author(s):  
Caio Ribeiro ◽  
Alex A. Freitas

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.


Sign in / Sign up

Export Citation Format

Share Document