A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

A dynamic recursive feature elimination framework (dRFE) to further refine a set of OMIC biomarkers

Bioinformatics ◽

10.1093/bioinformatics/btab055 ◽

2021 ◽

Author(s):

Yuanyuan Han ◽

Lan Huang ◽

Fengfeng Zhou

Keyword(s):

Experimental Data ◽

Feature Selection ◽

The Other ◽

Supplementary Information ◽

Recursive Feature Elimination ◽

Supplementary Data ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class Labels ◽

Selection Algorithms

Abstract Motivation A feature selection algorithm may select the subset of features with the best associations with the class labels. The recursive feature elimination (RFE) is a heuristic feature screening framework and has been widely used to select the biological OMIC biomarkers. This study proposed a dynamic recursive feature elimination (dRFE) framework with more flexible feature elimination operations. The proposed dRFE was comprehensively compared with 11 existing feature selection algorithms and five classifiers on the eight difficult transcriptome datasets from a previous study, the ten newly collected transcriptome datasets and the five methylome datasets. Results The experimental data suggested that the regular RFE framework did not perform well, and dRFE outperformed the existing feature selection algorithms in most cases. The dRFE-detected features achieved Acc = 1.0000 for the two methylome datasets GSE53045 and GSE66695. The best prediction accuracies of the dRFE-detected features were 0.9259, 0.9424 and 0.8601 for the other three methylome datasets GSE74845, GSE103186 and GSE80970, respectively. Four transcriptome datasets received Acc = 1.0000 using the dRFE-detected features, and the prediction accuracies for the other six newly collected transcriptome datasets were between 0.6301 and 0.9917. Availability and implementation The experiments in this study are implemented and tested using the programming language Python version 3.7.6. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A feature selection algorithm combining information gain and multi-objective genetic search for intrusion detection system

MATEC Web of Conferences ◽

10.1051/matecconf/202133608008 ◽

2021 ◽

Vol 336 ◽

pp. 08008

Author(s):

Tao Xie

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection Rate ◽

Information Gain ◽

Detection System ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Genetic Search ◽

Multi Objective

In order to improve the detection rate and speed of intrusion detection system, this paper proposes a feature selection algorithm. The algorithm uses information gain to rank the features in descending order, and then uses a multi-objective genetic algorithm to gradually search the ranking features to find the optimal feature combination. We classified the Kddcup98 dataset into five classes, DOS, PROBE, R2L, and U2R, and conducted numerous experiments on each class. Experimental results show that for each class of attack, the proposed algorithm can not only speed up the feature selection, but also significantly improve the detection rate of the algorithm.

Download Full-text

A Feature Selection Algorithm Based on Approximate Markov Blanket and Dynamic Mutual Information

Intelligent Science and Intelligent Data Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-642-31919-8_29 ◽

2012 ◽

pp. 226-233

Author(s):

Xiaodan Wang ◽

Xu Yao ◽

Yuxi Zhang ◽

Lei Lei

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Markov Blanket

Download Full-text

Research on Spam Filtering Technology Based on New Mutual Information Feature Selection Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1673/1/012028 ◽

2020 ◽

Vol 1673 ◽

pp. 012028

Author(s):

Kunfu Wang ◽

Wanfeng Mao ◽

Wei Feng ◽

Hui Wang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Spam Filtering ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

An Improved Mutual Information-Based Feature Selection Algorithm for Text Classification

2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics ◽

10.1109/ihmsc.2013.37 ◽

2013 ◽

Author(s):

Xiao-Yu Jiang ◽

Jin Shui

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Text Classification ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

WJMI: A New Feature Selection Algorithm Based on Weighted Joint Mutual Information

Proceedings of the 3rd International Conference on Mechatronics and Industrial Informatics ◽

10.2991/icmii-15.2015.108 ◽

2015 ◽

Cited By ~ 1

Author(s):

Xiuli Qi ◽

Chengxiang Yin ◽

Kai Cheng ◽

Xianglin Liao ◽

Xingdang Kang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

New Feature

Download Full-text

Performance Evaluation of Feature Selection Algorithms Applied to Online Learning in Concept Drift Environments

10.5753/eniac.2018.4438 ◽

2018 ◽

Author(s):

Matheus B. De Moraes ◽

André L. S. Gradvohl

Keyword(s):

Feature Selection ◽

Information Gain ◽

Concept Drift ◽

Computational Cost ◽

Selection Algorithm ◽

Information Need ◽

High Speeds ◽

Online Feature Selection ◽

Classification Tasks ◽

Selection Algorithms

Data streams are transmitted at high speeds with huge volume and may contain critical information need processing in real-time. Hence, to reduce computational cost and time, the system may apply a feature selection algorithm. However, this is not a trivial task due to the concept drift. In this work, we show that two feature selection algorithms, Information Gain and Online Feature Selection, present lower performance when compared to classification tasks without feature selection. Both algorithms presented more relevant results in one distinct scenario each, showing final accuracies up to 14% higher. The experiments using both real and artificial datasets present a potential for using these methods due to their better adaptability in some concept drift situations.

Download Full-text