scholarly journals An Improved Feature Selection Based on Effective Range for Classification

2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Jianzhong Wang ◽  
Shuang Zhou ◽  
Yugen Yi ◽  
Jun Kong

Feature selection is a key issue in the domain of machine learning and related fields. The results of feature selection can directly affect the classifier’s classification accuracy and generalization performance. Recently, a statistical feature selection method named effective range based gene selection (ERGS) is proposed. However, ERGS only considers the overlapping area (OA) among effective ranges of each class for every feature; it fails to handle the problem of the inclusion relation of effective ranges. In order to overcome this limitation, a novel efficient statistical feature selection approach called improved feature selection based on effective range (IFSER) is proposed in this paper. In IFSER, an including area (IA) is introduced to characterize the inclusion relation of effective ranges. Moreover, the samples’ proportion for each feature of every class in both OA and IA is also taken into consideration. Therefore, IFSER outperforms the original ERGS and some other state-of-the-art algorithms. Experiments on several well-known databases are performed to demonstrate the effectiveness of the proposed method.

2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


2014 ◽  
Vol 1030-1032 ◽  
pp. 1709-1712
Author(s):  
Kai Min Song ◽  
Xun Yi Ren

Through the research on the flow identification algorithm based on statistical feature, this paper puts forward the statistical feature selection algorithm in order to reduce the number of features in identification, increase the speed of the flow identification, the experimental results show that the algorithm can effectively reduce the amount of features, improve the efficiency of identification.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Shuangbao Song ◽  
Xingqian Chen ◽  
Zheng Tang ◽  
Yuki Todo

Microarray gene expression data provide a prospective way to diagnose disease and classify cancer. However, in bioinformatics, the gene selection problem, i.e., how to select the most informative genes from thousands of genes, remains challenging. This problem is a specific feature selection problem with high-dimensional features and small sample sizes. In this paper, a two-stage method combining a filter feature selection method and a wrapper feature selection method is proposed to solve the gene selection problem. In contrast to common methods, the proposed method models the gene selection problem as a multiobjective optimization problem. Both stages employ the same multiobjective differential evolution (MODE) as the search strategy but incorporate different objective functions. The three objective functions of the filter method are mainly based on mutual information. The two objective functions of the wrapper method are the number of selected features and the classification error of a naive Bayes (NB) classifier. Finally, the performance of the proposed method is tested and analyzed on six benchmark gene expression datasets. The experimental results verified that this paper provides a novel and effective way to solve the gene selection problem by applying a multiobjective optimization algorithm.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mariam Elhussein ◽  
Samiha Brahimi

PurposeThis paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.Design/methodology/approachFour machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.FindingsRadom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.Research limitations/implicationsThe method applied is novel, more testing is needed in other datasets before generalizing its results.Practical implicationsThe model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.Originality/valueThe research is proposing a new way textual clustering can be used in feature selection.


Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 291
Author(s):  
Ruzhang Zhao ◽  
Pengyu Hong ◽  
Jun S. Liu

Traditional hypothesis-margin researches focus on obtaining large margins and feature selection. In this work, we show that the robustness of margins is also critical and can be measured using entropy. In addition, our approach provides clear mathematical formulations and explanations to uncover feature interactions, which is often lack in large hypothesis-margin based approaches. We design an algorithm, termed IMMIGRATE (Iterative max-min entropy margin-maximization with interaction terms), for training the weights associated with the interaction terms. IMMIGRATE simultaneously utilizes both local and global information and can be used as a base learner in Boosting. We evaluate IMMIGRATE in a wide range of tasks, in which it demonstrates exceptional robustness and achieves the state-of-the-art results with high interpretability.


Sign in / Sign up

Export Citation Format

Share Document