An improved D-S evidence theory based neighborhood rough classification approach

2021 ◽  
pp. 1-13
Author(s):  
Tao Yin ◽  
Xiaojuan Mao ◽  
Xingtan Wu ◽  
Hengrong Ju ◽  
Weiping Ding ◽  
...  

Neighborhood classifier, a common classification method, is applied in pattern recognition and data mining. The neighborhood classifier mainly relies on the majority voting strategy to judge each category. This strategy only considers the number of samples in the neighborhood but ignores the distribution of samples, which leads to a decreased classification accuracy. To overcome the shortcomings and improve the classification performance, D-S evidence theory is applied to represent the evidence information support of other samples in the neighborhood, and the distance between samples in the neighborhood is taken into account. In this paper, a novel attribute reduction method of neighborhood rough set with a dynamic updating strategy is developed. Different from the traditional heuristic algorithm, the termination threshold of the proposed reduction algorithm is dynamically optimized. Therefore, when the attribute significance is not monotonic, this method can retrieve a better value, in contrast to the traditional method. Moreover, a new classification approach based on D-S evidence theory is proposed. Compared with the classical neighborhood classifier, this method considers the distribution of samples in the neighborhood, and evidence theory is applied to describe the closeness between samples. Finally, datasets from the UCI database are used to indicate that the improved reduction can achieve a lower neighborhood decision error rate than classical heuristic reduction. In addition, the improved classifier acquires higher classification performance in contrast to the traditional neighborhood classifier. This research provides a new direction for improving the accuracy of neighborhood classification.

Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1809-1815
Author(s):  
Shaochen Liang ◽  
Xibei Yang ◽  
Xiangjian Chen ◽  
Jingzheng Li

In neighborhood rough set theory, traditional heuristic algorithm for computing reducts does not take the stability of the selected attributes into account, it follows that the performances of the reducts may not be good enough if the perturbations of data occur. To fill the gap, the mechanism of acquiring the most significant attribute is realized by two steps in the reduction process: firstly, several important attributes are derived in each iteration based on several radii which are close to the given radius for computing reduct; secondly, the most significant attribute is selected from them by a voting strategy. The experiments verify that such method can effectively improve the stabilities of the reducts, and it does not require too much attributes for constructing the reducts.


2020 ◽  
Author(s):  
Aristidis G. Vrahatis ◽  
Sotiris Tasoulis ◽  
Spiros Georgakopoulos ◽  
Vassilis Plagianakos

AbstractNowadays the biomedical data are generated exponentially, creating datasets for analysis with ultra-high dimensionality and complexity. This revolution, which has been caused by recent advents in biotechnologies, has driven to big-data and data-driven computational approaches. An indicative example is the emerging single-cell RNA-sequencing (scRNA-seq) technology, which isolates and measures individual cells. Although scRNA-seq has revolutionized the biotechnology domain, such data computational analysis is a major challenge because of their ultra-high dimensionality and complexity. Following this direction, in this work we study the properties, effectiveness and generalization of the recently proposed MRPV algorithm for single cell RNA-seq data. MRPV is an ensemble classification technique utilizing multiple ultra-low dimensional Random Projected spaces. A given classifier determines the class for each sample for all independent spaces while a majority voting scheme defines their predominant class. We show that Random Projection ensembles offer a platform not only for a low computational time analysis but also for enhancing classification performance. The developed methodologies were applied to four real biomedical high dimensional data from single-cell RNA-seq studies and compared against well-known and similar classification tools. Experimental results showed that based on simplistic tools we can create a computationally fast, simple, yet effective approach for single cell RNA-seq data with ultra-high dimensionality.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jing Zhang ◽  
Guang Lu ◽  
Jiaquan Li ◽  
Chuanwen Li

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 138 ◽  
Author(s):  
Lin Sun ◽  
Lanying Wang ◽  
Jiucheng Xu ◽  
Shiguang Zhang

For continuous numerical data sets, neighborhood rough sets-based attribute reduction is an important step for improving classification performance. However, most of the traditional reduction algorithms can only handle finite sets, and yield low accuracy and high cardinality. In this paper, a novel attribute reduction method using Lebesgue and entropy measures in neighborhood rough sets is proposed, which has the ability of dealing with continuous numerical data whilst maintaining the original classification information. First, Fisher score method is employed to eliminate irrelevant attributes to significantly reduce computation complexity for high-dimensional data sets. Then, Lebesgue measure is introduced into neighborhood rough sets to investigate uncertainty measure. In order to analyze the uncertainty and noisy of neighborhood decision systems well, based on Lebesgue and entropy measures, some neighborhood entropy-based uncertainty measures are presented, and by combining algebra view with information view in neighborhood rough sets, a neighborhood roughness joint entropy is developed in neighborhood decision systems. Moreover, some of their properties are derived and the relationships are established, which help to understand the essence of knowledge and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is designed to improve the classification performance of large-scale complex data. The experimental results under an instance and several public data sets show that the proposed method is very effective for selecting the most relevant attributes with high classification accuracy.


2019 ◽  
Vol 11 (23) ◽  
pp. 2788 ◽  
Author(s):  
Uwe Knauer ◽  
Cornelius Styp von Rekowski ◽  
Marianne Stecklina ◽  
Tilman Krokotsch ◽  
Tuan Pham Minh ◽  
...  

In this paper, we evaluate different popular voting strategies for fusion of classifier results. A convolutional neural network (CNN) and different variants of random forest (RF) classifiers were trained to discriminate between 15 tree species based on airborne hyperspectral imaging data. The spectral data was preprocessed with a multi-class linear discriminant analysis (MCLDA) as a means to reduce dimensionality and to obtain spatial–spectral features. The best individual classifier was a CNN with a classification accuracy of 0.73 +/− 0.086. The classification performance increased to an accuracy of 0.78 +/− 0.053 by using precision weighted voting for a hybrid ensemble of the CNN and two RF classifiers. This voting strategy clearly outperformed majority voting (0.74), accuracy weighted voting (0.75), and presidential voting (0.75).


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 133565-133576
Author(s):  
Panpan Chen ◽  
Menglei Lin ◽  
Jinghua Liu

Sign in / Sign up

Export Citation Format

Share Document