Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree

In order to improve the detection rate and speed of intrusion detection system, this paper proposes a feature selection algorithm. The algorithm uses information gain to rank the features in descending order, and then uses a multi-objective genetic algorithm to gradually search the ranking features to find the optimal feature combination. We classified the Kddcup98 dataset into five classes, DOS, PROBE, R2L, and U2R, and conducted numerous experiments on each class. Experimental results show that for each class of attack, the proposed algorithm can not only speed up the feature selection, but also significantly improve the detection rate of the algorithm.

Download Full-text

Feature selection algorithm using information gain based clustering for supporting the treatment process of breast cancer

2016 International Conference on Informatics and Computing (ICIC) ◽

10.1109/iac.2016.7905680 ◽

2016 ◽

Cited By ~ 1

Author(s):

Tresna Maulana Fahrudin ◽

Iwan Syarif ◽

Ali Ridho Barakbah

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Treatment Process ◽

Information Gain ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery ◽

10.1109/cyberc.2015.53 ◽

2015 ◽

Cited By ~ 9

Author(s):

Jiamin Xu ◽

Hong Jiang

Keyword(s):

Feature Selection ◽

Information Gain ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Determining Threshold Value on Information Gain Feature Selection to Increase Speed and Prediction Accuracy of Random Forest

10.21203/rs.3.rs-132775/v1 ◽

2020 ◽

Author(s):

Maria Irmina Prasetiyowati ◽

Nur Ulfa Maulidevi ◽

Kridanto Surendro

Keyword(s):

Feature Selection ◽

Random Forest ◽

Information Gain ◽

Threshold Value ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Random Forest Classification ◽

Average Value ◽

Time Required

Abstract Feature selection is a preprocessing technique aims to remove the unnecessary features and speed up the algorithm's work process. One of the feature selection techniques is by calculating the information gain value of each feature in a dataset. From the information gain value obtained, then the determined threshold value will be used to make feature selection. Generally, the threshold value is used freely, or using a value of 0.05. This study proposed the determination of the threshold value using the standard deviation of the information gain value generated by each feature in the dataset. The determination of this threshold value was tested on ten original datasets and datasets that had been transformed by FFT and IFFT, then classified using Random Forest. The results of the average value of accuracy and the average time required from the Random Forest classification using the proposed threshold value are better compared to the results of feature selection with a threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. Likewise, the result of the average accuracy value of the proposed threshold using a transformed dataset in terms are better than the threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. However, the calculation results for the average time required are higher (slower).

Download Full-text

A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information

Computational Intelligence and Neuroscience ◽

10.1155/2021/3569632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Li Zhang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Information Gain ◽

Small Sample ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class Labels ◽

Minimum Interaction ◽

Classification Information ◽

Selection Algorithms

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.

Download Full-text