Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows

B. Venkatesh; J. Anuradha

doi:10.1515/comp-2020-0169

Fuzzy Rank Based Parallel Online Feature Selection Method using Multiple Sliding Windows

Open Computer Science ◽

10.1515/comp-2020-0169 ◽

2021 ◽

Vol 11 (1) ◽

pp. 275-287

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Streaming Data ◽

Selection Methods ◽

Sliding Windows ◽

Real World Applications ◽

Benchmark Datasets ◽

Online Feature Selection ◽

Online Streaming

Abstract Nowadays, in real-world applications, the dimensions of data are generated dynamically, and the traditional batch feature selection methods are not suitable for streaming data. So, online streaming feature selection methods gained more attention but the existing methods had demerits like low classification accuracy, fails to avoid redundant and irrelevant features, and a higher number of features selected. In this paper, we propose a parallel online feature selection method using multiple sliding-windows and fuzzy fast-mRMR feature selection analysis, which is used for selecting minimum redundant and maximum relevant features, and also overcomes the drawbacks of existing online streaming feature selection methods. To increase the performance speed of the proposed method parallel processing is used. To evaluate the performance of the proposed online feature selection method k-NN, SVM, and Decision Tree Classifiers are used and compared against the state-of-the-art online feature selection methods. Evaluation metrics like Accuracy, Precision, Recall, F1-Score are used on benchmark datasets for performance analysis. From the experimental analysis, it is proved that the proposed method has achieved more than 95% accuracy for most of the datasets and performs well over other existing online streaming feature selection methods and also, overcomes the drawbacks of the existing methods.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Quick online feature selection method for regression -A feature selection method inspired by human behavior-

2007 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2007.4414117 ◽

2007 ◽

Author(s):

Youhei Tadeuchi ◽

Ryuji Oshima ◽

Kyosuke Nishida ◽

Koichiro Yamauchi ◽

Takashi Omori

Keyword(s):

Feature Selection ◽

Human Behavior ◽

Feature Selection Method ◽

Selection Method ◽

Online Feature Selection

Download Full-text

OFS-Density: A novel online streaming feature selection method

Pattern Recognition ◽

10.1016/j.patcog.2018.08.009 ◽

2019 ◽

Vol 86 ◽

pp. 48-61 ◽

Cited By ~ 15

Author(s):

Peng Zhou ◽

Xuegang Hu ◽

Peipei Li ◽

Xindong Wu

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Online Streaming

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

ANOFS: Automated negotiation based online feature selection method

2015 15th International Conference on Intelligent Systems Design and Applications (ISDA) ◽

10.1109/isda.2015.7489229 ◽

2015 ◽

Cited By ~ 3

Author(s):

Fatma Ben Said ◽

Adel M. Alimi

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Automated Negotiation ◽

Online Feature Selection

Download Full-text

SPARSITY SCORE: A NOVEL GRAPH-PRESERVING FEATURE SELECTION METHOD

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414500098 ◽

2014 ◽

Vol 28 (04) ◽

pp. 1450009 ◽

Cited By ~ 16

Author(s):

MINGXIA LIU ◽

DAOQIANG ZHANG

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Compact Representation ◽

Data Sets ◽

Selection Methods ◽

Clustering And Classification ◽

Filter Type ◽

Selection Framework

As thousands of features are available in many pattern recognition and machine learning applications, feature selection remains an important task to find the most compact representation of the original data. In the literature, although a number of feature selection methods have been developed, most of them focus on optimizing specific objective functions. In this paper, we first propose a general graph-preserving feature selection framework where graphs to be preserved vary in specific definitions, and show that a number of existing filter-type feature selection algorithms can be unified within this framework. Then, based on the proposed framework, a new filter-type feature selection method called sparsity score (SS) is proposed. This method aims to preserve the structure of a pre-defined l1 graph that is proven robust to data noise. Here, the modified sparse representation based on an l1-norm minimization problem is used to determine the graph adjacency structure and corresponding affinity weight matrix simultaneously. Furthermore, a variant of SS called supervised SS (SuSS) is also proposed, where the l1 graph to be preserved is constructed by using only data points from the same class. Experimental results of clustering and classification tasks on a series of benchmark data sets show that the proposed methods can achieve better performance than conventional filter-type feature selection methods.

Download Full-text

A New Online Feature Selection Method Using Neighborhood Rough Set

2017 IEEE International Conference on Big Knowledge (ICBK) ◽

10.1109/icbk.2017.41 ◽

2017 ◽

Author(s):

Peng Zhou ◽

Xuegang Hu ◽

Peipei Li

Keyword(s):

Feature Selection ◽

Rough Set ◽

Feature Selection Method ◽

Selection Method ◽

Neighborhood Rough Set ◽

Online Feature Selection

Download Full-text

Enhanced Classification Method for Phishing Emails Detection

Journal of Information Security and Cybercrimes Research ◽

10.26735/ygmy6142 ◽

2020 ◽

Vol 3 (1) ◽

pp. 58-63

Author(s):

Y. Mansour Mansour ◽

Majed A. Alenizi

Keyword(s):

Feature Selection ◽

Information Gain ◽

Hybrid Approach ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Classification Model ◽

Selection Methods ◽

Accuracy Rate ◽

Communication Method

Emails are currently the main communication method worldwide as it proven in its efficiency. Phishing emails in the other hand is one of the major threats which results in significant losses, estimated at billions of dollars. Phishing emails is a more dynamic problem, a struggle between the phishers and defenders where the phishers have more flexibility in manipulating the emails features and evading the anti-phishing techniques. Many solutions have been proposed to mitigate the phishing emails impact on the targeted sectors, but none have achieved 100% detection and accuracy. As phishing techniques are evolving, the solutions need to be evolved and generalized in order to mitigate as much as possible. This article presents a new emergent classification model based on hybrid feature selection method that combines two common feature selection methods, Information Gain and Genetic Algorithm that keep only significant and high-quality features in the final classifier. The Proposed hybrid approach achieved 98.9% accuracy rate against phishing emails dataset comprising 8266 instances and results depict enhancement by almost 4%. Furthermore, the presented technique has contributed to reducing the search space by reducing the number of selected features.

Download Full-text

Multi-Label Feature Selection Method Based on Dynamic Weight

10.21203/rs.3.rs-604646/v1 ◽

2021 ◽

Author(s):

Ping Zhang ◽

Jiyao Sheng ◽

Wanfu Gao ◽

Juncheng Hu ◽

Yonghao Li

Keyword(s):

Feature Selection ◽

Dynamic Change ◽

Feature Selection Method ◽

Selection Method ◽

Data Sets ◽

Selection Methods ◽

Real World Data ◽

Amount Of Information ◽

The Difference ◽

Classification Information

Abstract Multi-label feature selection attracts considerable attention from multi-label learning. Information-theory based multi-label feature selection methods intend to select the most informative features and reduce the uncertain amount of information of labels. Previous methods regard the uncertain amount of information of labels as constant. In fact, as the classification information of the label set is captured by features, the remaining uncertainty of each label is changing dynamically. In this paper, we categorize labels into two groups: one contains the labels with few remaining uncertainty, which means that most of classification information with respect to the labels has been obtained by the already-selected features; another group contains the labels with extensive remaining uncertainty, which means that the classification information of these labels is neglected by already-selected features. Feature selection aims to select the new features with highly relevant to the labels in the second group. Existing methods do not distinguish the difference between two label groups and ignore the dynamic change amount of information of labels. To this end, a Relevancy Ratio is designed to clarify the dynamic change amount of information of each label under the condition of the already-selected features. Afterwards, a Weighted Feature Relevancy is defined to evaluate the candidate features. Finally, a new multi-label Feature Selection method based on Weighted Feature Relevancy (WFRFS) is proposed. The experiments obtain encouraging results of WFRFS in comparison to six multi-label feature selection methods on thirteen real-world data sets.

Download Full-text