Online Streaming Feature Selection Based on Feature Interaction

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.

Download Full-text

Online Streaming Feature Selection via Conditional Independence

Applied Sciences ◽

10.3390/app8122548 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2548 ◽

Cited By ~ 2

Author(s):

Dianlong You ◽

Xindong Wu ◽

Limin Shen ◽

Yi He ◽

Xu Yuan ◽

...

Keyword(s):

Feature Selection ◽

Conditional Independence ◽

Prediction Accuracy ◽

High Accuracy ◽

Running Time ◽

Performance Improvements ◽

Average Running Time ◽

Significant Performance ◽

Online Feature Selection ◽

Online Streaming

Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.

Download Full-text

A Mixed Feature Selection Method Considering Interaction

Mathematical Problems in Engineering ◽

10.1155/2015/989067 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 3

Author(s):

Zilin Zeng ◽

Hongjun Zhang ◽

Rui Zhang ◽

Youliang Zhang

Keyword(s):

Feature Selection ◽

Rough Sets ◽

Feature Selection Method ◽

Feature Space ◽

Feature Interaction ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Neighborhood Rough Sets ◽

Real World Datasets ◽

Neighborhood Interaction

Feature interaction has gained considerable attention recently. However, many feature selection methods considering interaction are only designed for categorical features. This paper proposes a mixed feature selection algorithm based on neighborhood rough sets that can be used to search for interacting features. In this paper, feature relevance, feature redundancy, and feature interaction are defined in the framework of neighborhood rough sets, the neighborhood interaction weight factor reflecting whether a feature is redundant or interactive is proposed, and a neighborhood interaction weight based feature selection algorithm (NIWFS) is brought forward. To evaluate the performance of the proposed algorithm, we compare NIWFS with other three feature selection algorithms, including INTERACT, NRS, and NMI, in terms of the classification accuracies and the number of selected features with C4.5 and IB1. The results from ten real world datasets indicate that NIWFS not only deals with mixed datasets directly, but also reduces the dimensionality of feature space with the highest average accuracies.

Download Full-text