Online Streaming Feature Selection Based on Feature Interaction

Author(s):  
Yan Lv ◽  
Yaojin Lin ◽  
Xiangyan Chen ◽  
Dongxing Wang ◽  
Chenxi Wang
2019 ◽  
Vol 86 ◽  
pp. 48-61 ◽  
Author(s):  
Peng Zhou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Xindong Wu

2019 ◽  
Vol 481 ◽  
pp. 258-279 ◽  
Author(s):  
Peng Zhou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Xindong Wu

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1635
Author(s):  
Dingfei Lei ◽  
Pei Liang ◽  
Junhua Hu ◽  
Yuan Yuan

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.


2018 ◽  
Vol 8 (12) ◽  
pp. 2548 ◽  
Author(s):  
Dianlong You ◽  
Xindong Wu ◽  
Limin Shen ◽  
Yi He ◽  
Xu Yuan ◽  
...  

Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Zilin Zeng ◽  
Hongjun Zhang ◽  
Rui Zhang ◽  
Youliang Zhang

Feature interaction has gained considerable attention recently. However, many feature selection methods considering interaction are only designed for categorical features. This paper proposes a mixed feature selection algorithm based on neighborhood rough sets that can be used to search for interacting features. In this paper, feature relevance, feature redundancy, and feature interaction are defined in the framework of neighborhood rough sets, the neighborhood interaction weight factor reflecting whether a feature is redundant or interactive is proposed, and a neighborhood interaction weight based feature selection algorithm (NIWFS) is brought forward. To evaluate the performance of the proposed algorithm, we compare NIWFS with other three feature selection algorithms, including INTERACT, NRS, and NMI, in terms of the classification accuracies and the number of selected features with C4.5 and IB1. The results from ten real world datasets indicate that NIWFS not only deals with mixed datasets directly, but also reduces the dimensionality of feature space with the highest average accuracies.


Sign in / Sign up

Export Citation Format

Share Document