Online Streaming Feature Selection Using Sampling Technique and Correlations Between Features

OFS-Density: A novel online streaming feature selection method

Pattern Recognition ◽

10.1016/j.patcog.2018.08.009 ◽

2019 ◽

Vol 86 ◽

pp. 48-61 ◽

Cited By ~ 15

Author(s):

Peng Zhou ◽

Xuegang Hu ◽

Peipei Li ◽

Xindong Wu

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Online Streaming

Download Full-text

Preprocessing of imbalanced breast cancer data using feature selection combined with over-sampling technique for classification

2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis.2013.6761610 ◽

2013 ◽

Cited By ~ 4

Author(s):

Janjira Jojan ◽

Anongnart Srivihok

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Sampling Technique ◽

Breast Cancer Data ◽

Cancer Data

Download Full-text

Online streaming feature selection using rough sets

International Journal of Approximate Reasoning ◽

10.1016/j.ijar.2015.11.006 ◽

2016 ◽

Vol 69 ◽

pp. 35-57 ◽

Cited By ~ 32

Author(s):

S. Eskandari ◽

M.M. Javidi

Keyword(s):

Feature Selection ◽

Rough Sets ◽

Online Streaming

Download Full-text

Thai Water Buffalo Disease Analysis with the Application of Feature Selection Technique and Multi-Layer Perceptron Neural Network

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.4049 ◽

2021 ◽

Vol 11 (2) ◽

pp. 6907-6911

Author(s):

S. Nuanmeesri ◽

W. Sriurai

Keyword(s):

Neural Network ◽

Feature Selection ◽

Water Buffalo ◽

Information Gain ◽

Sampling Technique ◽

Multi Layer Perceptron ◽

Analysis Model ◽

Feature Selection Technique ◽

Selection Technique ◽

Correlation Based Feature Selection

This research aims to develop the analysis model for diseases in water buffalo towards the application of the feature selection technique along with the Multi-Layer Perceptron neural network. The data used for analysis was collected from books and documents related to diseases in water buffalo and the official website of the Department of Livestock Development. The data consists of the characteristics of six diseases in water buffalo, including Anthrax disease, Hemorrhagic Septicemia, Brucellosis, Foot and Mouth disease, Parasitic disease, and Mastitis. Since the amount of the collected data was limited, the Synthetic Minority Over-sampling Technique was also employed to adjust the imbalance dataset. Afterward, the adjusted dataset was used to select the disease characteristics towards the application of two feature selection techniques, including Correlation-based Feature Selection and Information Gain. Subsequently, the selected features were then used for developing the analysis model for diseases in water buffalo towards the use of Multi-Layer Perceptron neural network. The evaluation results of the model’s effectiveness, given by the 10-fold cross-validation, showed that the analysis model for diseases in water buffalo developed by Correlation-based Feature Selection and Multi-Layer Perceptron neural network provided the highest level of effectiveness with the accuracy of 99.71%, the precision of 99.70%, and the recall of 99.72%. This implies that the analysis model is effectively applicable.

Download Full-text

Online streaming feature selection using adapted Neighborhood Rough Set

Information Sciences ◽

10.1016/j.ins.2018.12.074 ◽

2019 ◽

Vol 481 ◽

pp. 258-279 ◽

Cited By ~ 11

Author(s):

Peng Zhou ◽

Xuegang Hu ◽

Peipei Li ◽

Xindong Wu

Keyword(s):

Feature Selection ◽

Rough Set ◽

Neighborhood Rough Set ◽

Online Streaming

Download Full-text

New Online Streaming Feature Selection Based on Neighborhood Rough Set for Medical Data

Symmetry ◽

10.3390/sym12101635 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1635

Author(s):

Dingfei Lei ◽

Pei Liang ◽

Junhua Hu ◽

Yuan Yuan

Keyword(s):

Feature Selection ◽

Rough Set ◽

Rough Set Theory ◽

Imbalanced Data ◽

Computation Method ◽

Feature Subset ◽

Neighborhood Rough Set ◽

Neighborhood Relation ◽

Neighborhood Relations ◽

Online Streaming

Not all features in many real-world applications, such as medical diagnosis and fraud detection, are available from the start. They are formed and individually flow over time. Online streaming feature selection (OSFS) has recently attracted much attention due to its ability to select the best feature subset with growing features. Rough set theory is widely used as an effective tool for feature selection, specifically the neighborhood rough set. However, the two main neighborhood relations, namely k-neighborhood and neighborhood, cannot efficiently deal with the uneven distribution of data. The traditional method of dependency calculation does not take into account the structure of neighborhood covering. In this study, a novel neighborhood relation combined with k-neighborhood and neighborhood relations is initially defined. Then, we propose a weighted dependency degree computation method considering the structure of the neighborhood relation. In addition, we propose a new OSFS approach named OSFS-KW considering the challenge of learning class imbalanced data. OSFS-KW has no adjustable parameters and pretraining requirements. The experimental results on 19 datasets demonstrate that OSFS-KW not only outperforms traditional methods but, also, exceeds the state-of-the-art OSFS approaches.

Download Full-text

Online Streaming Feature Selection via Conditional Independence

Applied Sciences ◽

10.3390/app8122548 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2548 ◽

Cited By ~ 2

Author(s):

Dianlong You ◽

Xindong Wu ◽

Limin Shen ◽

Yi He ◽

Xu Yuan ◽

...

Keyword(s):

Feature Selection ◽

Conditional Independence ◽

Prediction Accuracy ◽

High Accuracy ◽

Running Time ◽

Performance Improvements ◽

Average Running Time ◽

Significant Performance ◽

Online Feature Selection ◽

Online Streaming

Online feature selection is a challenging topic in data mining. It aims to reduce the dimensionality of streaming features by removing irrelevant and redundant features in real time. Existing works, such as Alpha-investing and Online Streaming Feature Selection (OSFS), have been proposed to serve this purpose, but they have drawbacks, including low prediction accuracy and high running time if the streaming features exhibit characteristics such as low redundancy and high relevance. In this paper, we propose a novel algorithm about online streaming feature selection, named ConInd that uses a three-layer filtering strategy to process streaming features with the aim of overcoming such drawbacks. Through three-layer filtering, i.e., null-conditional independence, single-conditional independence, and multi-conditional independence, we can obtain an approximate Markov blanket with high accuracy and low running time. To validate the efficiency, we implemented the proposed algorithm and tested its performance on a prevalent dataset, i.e., NIPS 2003 and Causality Workbench. Through extensive experimental results, we demonstrated that ConInd offers significant performance improvements in prediction accuracy and running time compared to Alpha-investing and OSFS. ConInd offers 5.62% higher average prediction accuracy than Alpha-investing, with a 53.56% lower average running time compared to that for OSFS when the dataset is lowly redundant and highly relevant. In addition, the ratio of the average number of features for ConInd is 242% less than that for Alpha-investing.

Download Full-text

Online streaming feature selection with incremental feature grouping

Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery ◽

10.1002/widm.1364 ◽

2020 ◽

Vol 10 (4) ◽

Author(s):

Noura Al Nuaimi ◽

Mohammad M. Masud

Keyword(s):

Feature Selection ◽

Feature Grouping ◽

Online Streaming

Download Full-text

On Combining Feature Selection and Over-Sampling Techniques for Breast Cancer Prediction

Applied Sciences ◽

10.3390/app11146574 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6574

Author(s):

Min-Wei Huang ◽

Chien-Hung Chiu ◽

Chih-Fong Tsai ◽

Wei-Chao Lin

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Prediction Model ◽

Performance Improvement ◽

Prediction Models ◽

Information Gain ◽

Class Imbalance ◽

Sampling Technique ◽

Sampling Techniques ◽

Cancer Prediction

Breast cancer prediction datasets are usually class imbalanced, where the number of data samples in the malignant and benign patient classes are significantly different. Over-sampling techniques can be used to re-balance the datasets to construct more effective prediction models. Moreover, some related studies have considered feature selection to remove irrelevant features from the datasets for further performance improvement. However, since the order of combining feature selection and over-sampling can result in different training sets to construct the prediction model, it is unknown which order performs better. In this paper, the information gain (IG) and genetic algorithm (GA) feature selection methods and the synthetic minority over-sampling technique (SMOTE) are used for different combinations. The experimental results based on two breast cancer datasets show that the combination of feature selection and over-sampling outperform the single usage of either feature selection and over-sampling for the highly class imbalanced datasets. In particular, performing IG first and SMOTE second is the better choice. For other datasets with a small class imbalance ratio and a smaller number of features, performing SMOTE is enough to construct an effective prediction model.

Download Full-text

Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows

Open Computer Science ◽

10.1515/comp-2020-0169 ◽

2021 ◽

Vol 11 (1) ◽

pp. 275-287

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Streaming Data ◽

Selection Methods ◽

Sliding Windows ◽

Real World Applications ◽

Benchmark Datasets ◽

Online Feature Selection ◽

Online Streaming

Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.

Download Full-text