An Effective Feature Selection Method via Mutual Information Estimation

Author(s):  
Jian-Bo Yang ◽  
Chong-Jin Ong
Author(s):  
Gang Liu ◽  
Chunlei Yang ◽  
Sen Liu ◽  
Chunbao Xiao ◽  
Bin Song

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 116875-116885 ◽  
Author(s):  
G. S. Thejas ◽  
Sajal Raj Joshi ◽  
S. S. Iyengar ◽  
N. R. Sunitha ◽  
Prajwal Badrinath

2015 ◽  
Vol 1 ◽  
pp. e24 ◽  
Author(s):  
Zhihua Li ◽  
Wenqu Gu

No order correlation or similarity metric exists in nominal data, and there will always be more redundancy in a nominal dataset, which means that an efficient mutual information-based nominal-data feature selection method is relatively difficult to find. In this paper, a nominal-data feature selection method based on mutual information without data transformation, called the redundancy-removing more relevance less redundancy algorithm, is proposed. By forming several new information-related definitions and the corresponding computational methods, the proposed method can compute the information-related amount of nominal data directly. Furthermore, by creating a new evaluation function that considers both the relevance and the redundancy globally, the new feature selection method can evaluate the importance of each nominal-data feature. Although the presented feature selection method takes commonly used MIFS-like forms, it is capable of handling high-dimensional datasets without expensive computations. We perform extensive experimental comparisons of the proposed algorithm and other methods using three benchmarking nominal datasets with two different classifiers. The experimental results demonstrate the average advantage of the presented algorithm over the well-known NMIFS algorithm in terms of the feature selection and classification accuracy, which indicates that the proposed method has a promising performance.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 151525-151538 ◽  
Author(s):  
Xinzheng Wang ◽  
Bing Guo ◽  
Yan Shen ◽  
Chimin Zhou ◽  
Xuliang Duan

Sign in / Sign up

Export Citation Format

Share Document