A Filter Feature Selection Algorithm Based on Mutual Information for Intrusion Detection

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

A Feature Selection Method for Improved Clonal Algorithm Towards Intrusion Detection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416590138 ◽

2016 ◽

Vol 30 (05) ◽

pp. 1659013 ◽

Cited By ~ 7

Author(s):

Chunyong Yin ◽

Luyu Ma ◽

Lu Feng

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

False Positive ◽

False Positive Rate ◽

Feature Selection Method ◽

Selection Method ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Positive Rate ◽

Better Than

Intrusion detection is a kind of security mechanism which is used to detect attacks and intrusion behaviors. Due to the low accuracy and the high false positive rate of the existing clonal selection algorithms applied to intrusion detection, in this paper, we proposed a feature selection method for improved clonal algorithm. The improved method detects the intrusion behavior by selecting the best individual overall and clones them. Experimental results show that the feature selection algorithm is better than the traditional feature selection algorithm on the different classifiers, and it is shown that the final detection results are better than traditional clonal algorithm with 99.6% accuracy and 0.1% false positive rate.

Download Full-text

A redundancy-removing feature selection algorithm for nominal data

PeerJ Computer Science ◽

10.7717/peerj-cs.24 ◽

2015 ◽

Vol 1 ◽

pp. e24 ◽

Cited By ~ 1

Author(s):

Zhihua Li ◽

Wenqu Gu

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Selection Algorithm ◽

Nominal Data ◽

New Information ◽

New Feature ◽

High Dimensional Datasets ◽

Experimental Comparisons

No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

Download Full-text

New Feature Selection Method Based on SVM-RFE

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3100 ◽

2014 ◽

Vol 926-930 ◽

pp. 3100-3104 ◽

Cited By ~ 1

Author(s):

Xi Wang ◽

Qiang Li ◽

Zhi Hong Xie

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Recognition Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Testing Time ◽

Feature Subset ◽

Selection Algorithm ◽

Accuracy Rate ◽

New Feature

This article analyzed the defects of SVM-RFE feature selection algorithm, put forward new feature selection method combined SVM-RFE and PCA. Firstly, get the best feature subset through the method of cross validation of k based on SVM-RFE. Then, the PCA decreased the dimension of the feature subset and got the independent feature subset. The independent feature subset was the training and testing subset of SVM. Make experiments on five subsets of UCI, the results indicated that the training and testing time was shortened and the recognition accuracy rate of the SVM was higher.

Download Full-text

A redundancy-removing feature selection algorithm for nominal data

10.7287/peerj.preprints.1184v1 ◽

2015 ◽

Cited By ~ 1

Author(s):

Zhihua Li

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Selection Algorithm ◽

Nominal Data ◽

New Information ◽

New Feature ◽

High Dimensional Datasets ◽

Experimental Comparisons

No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

Download Full-text

A redundancy-removing feature selection algorithm for nominal data

10.7287/peerj.preprints.1184 ◽

2015 ◽

Author(s):

Zhihua Li

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Selection Method ◽

Selection Algorithm ◽

Nominal Data ◽

New Information ◽

New Feature ◽

High Dimensional Datasets ◽

Experimental Comparisons

No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.

Download Full-text

An Effective Genetic Algorithm-Based Feature Selection Method for Intrusion Detection Systems

Computers & Security ◽

10.1016/j.cose.2021.102448 ◽

2021 ◽

pp. 102448

Author(s):

Zahid Halim ◽

Muhammad Nadeem Yousaf ◽

Muhammad Waqas ◽

Muhammad Suleman ◽

Ghulam Abbas ◽

...

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Intrusion Detection ◽

Feature Selection Method ◽

Selection Method ◽

Intrusion Detection Systems ◽

Detection Systems

Download Full-text

A Feature Selection Method for Intrusion Detection Based on Parallel Sparrow Search Algorithm

10.1109/iccse51940.2021.9569597 ◽

2021 ◽

Author(s):

Hongwei Chen ◽

Xin Ma ◽

Song Huang

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Search Algorithm ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text