Efficient Feature Subset Selection Algorithm for High Dimensional Data

Author(s):  
Smita Chormunge ◽  
Sudarson Jena

<p>Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.</p>

Author(s):  
Smita Chormunge ◽  
Sudarson Jena

<p>Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.</p>


Author(s):  
Hui Wang ◽  
Li Li Guo ◽  
Yun Lin

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.


2013 ◽  
Vol 774-776 ◽  
pp. 1532-1537
Author(s):  
Jing Wei Yang ◽  
Si Le Wang ◽  
Ying Yi Chen ◽  
Su Kui Lu ◽  
Wen Zhu Yang

This paper presents a genetic-based feature selection algorithm for object recognition. Firstly, the proposed algorithm encodes a solution with a binary chromosome. Secondly, the initial population was generated randomly. Thirdly, a crossover operator and a mutation operator are employed to operate on these chromosomes to generate more competency chromosomes. The probability of the crossover and mutation are adjusted dynamically according to the generation number and the fitness value. The proposed algorithm is tested using the features extracted from cotton foreign fiber objects. The results indicate that the proposed algorithm can obtain the optimal feature subset, and can reduce the classification time while keeping the classification accuracy constant.


2013 ◽  
Vol 347-350 ◽  
pp. 2344-2348
Author(s):  
Lin Cheng Jiang ◽  
Wen Tang Tan ◽  
Zhen Wen Wang ◽  
Feng Jing Yin ◽  
Bin Ge ◽  
...  

Feature selection has become the focus of research areas of applications with high dimensional data. Nonnegative matrix factorization (NMF) is a good method for dimensionality reduction but it cant select the optimal feature subset for its a feature extraction method. In this paper, a two-step strategy method based on improved NMF is proposed.The first step is to get the basis of each catagory in the dataset by NMF. Added constrains can guarantee these basises are sparse and mostly distinguish from each other which can contribute to classfication. An auxiliary function is used to prove the algorithm convergent.The classic ReliefF algorithm is used to weight each feature by all the basis vectors and choose the optimal feature subset in the second step.The experimental results revealed that the proposed method can select a representive and relevant feature subset which is effective in improving the performance of the classifier.


2013 ◽  
Vol 347-350 ◽  
pp. 2712-2716
Author(s):  
Lin Tao Lü ◽  
Peng Li ◽  
Yu Xiang Yang ◽  
Fang Tan

According to the features of Palm bio-impedance spectroscopy (BIS) data, this paper suggests a kind of effective feature model of palm BIS data elliptical model. The model combines immune clone algorithm and least squares method, establishes a palm BIS feature selection algorithm, and uses the algorithm to obtain the optimal feature subset that can completely represent the palm BIS data, and then use several classification algorithms for classification and comparison. The experimental results show that accuracy of the feature subset obtained through the algorithm in SVM classification algorithm test can reach 93.2, thereby verifying the algorithm is a valid and reliable palm BIS feature selection algorithm.


Sign in / Sign up

Export Citation Format

Share Document