Online streaming feature selection using adapted Neighborhood Rough Set

2019 ◽  
Vol 481 ◽  
pp. 258-279 ◽  
Author(s):  
Peng Zhou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Xindong Wu
Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1635
Author(s):  
Dingfei Lei ◽  
Pei Liang ◽  
Junhua Hu ◽  
Yuan Yuan

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.


2021 ◽  
pp. 108025
Author(s):  
Shuangjie Li ◽  
Kaixiang Zhang ◽  
Yali Li ◽  
Shuqin Wang ◽  
Shaoqiang Zhang

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jing Zhang ◽  
Guang Lu ◽  
Jiaquan Li ◽  
Chuanwen Li

Mining useful knowledge from high-dimensional data is a hot research topic. Efficient and effective sample classification and feature selection are challenging tasks due to high dimensionality and small sample size of microarray data. Feature selection is necessary in the process of constructing the model to reduce time and space consumption. Therefore, a feature selection model based on prior knowledge and rough set is proposed. Pathway knowledge is used to select feature subsets, and rough set based on intersection neighborhood is then used to select important feature in each subset, since it can select features without redundancy and deals with numerical features directly. In order to improve the diversity among base classifiers and the efficiency of classification, it is necessary to select part of base classifiers. Classifiers are grouped into several clusters by k-means clustering using the proposed combination distance of Kappa-based diversity and accuracy. The base classifier with the best classification performance in each cluster will be selected to generate the final ensemble model. Experimental results on three Arabidopsis thaliana stress response datasets showed that the proposed method achieved better classification performance than existing ensemble models.


2018 ◽  
Vol 7 (2) ◽  
pp. 75-84 ◽  
Author(s):  
Shivam Shreevastava ◽  
Anoop Kumar Tiwari ◽  
Tanmoy Som

Feature selection is one of the widely used pre-processing techniques to deal with large data sets. In this context, rough set theory has been successfully implemented for feature selection of discrete data set but in case of continuous data set it requires discretization, which may cause information loss. Fuzzy rough set theory approaches have also been used successfully to resolve this issue as it can handle continuous data directly. Moreover, almost all feature selection techniques are used to handle homogeneous data set. In this article, the center of attraction is on heterogeneous feature subset reduction. A novel intuitionistic fuzzy neighborhood models have been proposed by combining intuitionistic fuzzy sets and neighborhood rough set models by taking an appropriate pair of lower and upper approximations and generalize it for feature selection, supported with theory and its validation. An appropriate algorithm along with application to a data set has been added.


2019 ◽  
Vol 5 (3) ◽  
pp. 329-347 ◽  
Author(s):  
Rachid Benouini ◽  
Imad Batioua ◽  
Soufiane Ezghari ◽  
Khalid Zenkouar ◽  
Azeddine Zahi

2021 ◽  
pp. 107167
Author(s):  
Jihong Wan ◽  
Hongmei Chen ◽  
Zhong Yuan ◽  
Tianrui Li ◽  
Xiaoling Yang ◽  
...  

2021 ◽  
pp. 107223
Author(s):  
Binbin Sang ◽  
Hongmei Chen ◽  
Lei Yang ◽  
Tianrui Li ◽  
Weihua Xu ◽  
...  

2019 ◽  
Vol 8 (4) ◽  
pp. 8463-8474

In the view of performance improvement in machine learning algorithms it is essential to feed them with relevant features. Feature selection is one of the evident process followed by most of the learning algorithms for choosing the relevant features towards reducing the dimensionality of the dataset as well as to improve the classification accuracy. Among various feature selection techniques, Rough Set Theory (RST) has its own major contributions to feature selection domain. However, the conventional rough set based feature selection procedure makes binary decision on either marking an attribute as relevant or irrelevant.The fuzzy based Rough Set could resolve this problem by finding the relevancy by using membership values, however, this method is unable to identify the boundary or range of an attribute value which is appropriate for classification. The idea of feature selection is inappropriate when specific range of an attribute value represents a decision variable while seems to be irrelevant when it’s complete range is considered. This research work focuses on choosing relevant features for the problem of driver inattention detection. The features extracted for the focused problem are real numbers, hence the Neighborhood Rough Set (NRS) model is followed here rather than conventional Rough Set(RS) approach. In this paper, a Range specific Neighborhood Rough Set (RNRS) based feature selection is proposed for more accurate feature selection for the application of detecting driver’s inattention problem. The experiments are carried out with three real time driver datasets and the results are reported to prove the significance of the proposed RNRS based feature selection. Two learning algorithms, namely K-nearest neighbors and support vector machines are usedto evaluate the performance of the proposed approach. Theresults show that the proposed algorithm can significantlyimprove the classification performance.


Author(s):  
Xiaomin Zhao ◽  
Qinghua Hu ◽  
Yaguo Lei ◽  
Ming J. Zuo

Rough set has been widely used as a method of feature selection in fault diagnosis. The neighborhood rough set model can deal with both nominal and numerical features, but selecting the neighborhood size for its application may be a challenge. In this paper, we illustrate that using a single neighborhood size for all features may overestimate or underestimate a feature’s degree of dependency. The neighborhood rough set model is then modified by setting different neighborhood sizes for different features. The modified model is applied to fault diagnosis of slurry pump impellers. The chosen feature subsets generated by the modified rough set model can be physically explained by the corresponding flow patterns and generate higher classification accuracy than the original feature subsets and the feature subsets generated by the original rough set model.


Sign in / Sign up

Export Citation Format

Share Document