scholarly journals Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

2021 ◽  
Vol 17 (4) ◽  
pp. 3-14
Author(s):  
Sadegh Eskandari ◽  
2019 ◽  
Vol 86 ◽  
pp. 48-61 ◽  
Author(s):  
Peng Zhou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Xindong Wu

2019 ◽  
Vol 481 ◽  
pp. 258-279 ◽  
Author(s):  
Peng Zhou ◽  
Xuegang Hu ◽  
Peipei Li ◽  
Xindong Wu

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1635
Author(s):  
Dingfei Lei ◽  
Pei Liang ◽  
Junhua Hu ◽  
Yuan Yuan

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.


2018 ◽  
Vol 8 (12) ◽  
pp. 2548 ◽  
Author(s):  
Dianlong You ◽  
Xindong Wu ◽  
Limin Shen ◽  
Yi He ◽  
Xu Yuan ◽  
...  

Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.


2021 ◽  
Vol 11 (1) ◽  
pp. 275-287
Author(s):  
B. Venkatesh ◽  
J. Anuradha

Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.


2016 ◽  
Vol 113 ◽  
pp. 1-3 ◽  
Author(s):  
Kui Yu ◽  
Wei Ding ◽  
Xindong Wu

Sign in / Sign up

Export Citation Format

Share Document