scholarly journals AFS: An Attention-Based Mechanism for Supervised Feature Selection

Author(s):  
Ning Gui ◽  
Danni Ge ◽  
Ziyin Hu

As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature selec-tion patterns adjusted by backpropagation during the training process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also introduced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algorithms upon both MNIST, noisy MNIST and several datasets with small samples.

2010 ◽  
Vol 9 ◽  
pp. CIN.S4020 ◽  
Author(s):  
Chen Zhao ◽  
Michael L. Bittner ◽  
Robert S. Chapkin ◽  
Edward R. Dougherty

When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets. Availability: companion website at http://gsp.tamu.edu/Publications/supplementary/zhao09a/


Author(s):  
Zheng Chen ◽  
Meng Pang ◽  
Zixin Zhao ◽  
Shuainan Li ◽  
Rui Miao ◽  
...  

Abstract Motivation Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms. Results A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets. Availability and implementation All the algorithms were implemented and tested under the programming environment Python version 3.6.6. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 26 (1) ◽  
pp. 17
Author(s):  
Thomas Daniel ◽  
Fabien Casenave ◽  
Nissrine Akkari ◽  
David Ryckelynck

Classification algorithms have recently found applications in computational physics for the selection of numerical methods or models adapted to the environment and the state of the physical system. For such classification tasks, labeled training data come from numerical simulations and generally correspond to physical fields discretized on a mesh. Three challenging difficulties arise: the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data. This article introduces two algorithms to address these issues: one for dimensionality reduction via feature selection, and one for data augmentation. These algorithms are combined with a wide variety of classifiers for their evaluation. When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.


2021 ◽  
Vol 37 (1) ◽  
pp. 43-56
Author(s):  
Nguyen The Cuong ◽  
Huynh The Phung

In binary classification problems, two classes of data seem to be different from each other. It is expected to be more complicated due to the clusters in each class also tend to be different. Traditional algorithms as Support Vector Machine (SVM) or Twin Support Vector Machine (TWSVM) cannot sufficiently exploit structural information with cluster granularity of the data, cause limitation on the capability of simulation of data trends. Structural Twin Support Vector Machine (S-TWSVM) sufficiently exploits structural information with cluster granularity for learning a represented hyperplane. Therefore, the capability of S-TWSVM’s data simulation is better than that of TWSVM. However, for the datasets where each class consists of clusters of different trends, the S-TWSVM’s data simulation capability seems restricted. Besides, the training time of S-TWSVM has not been improved compared to TWSVM. This paper proposes a new Weighted Structural - Support Vector Machine (called WS-SVM) for binary classification problems with a class-vs-clusters strategy. Experimental results show that WS-SVM could describe the tendency of the distribution of cluster information. Furthermore, both the theory and experiment show that the training time of the WS-SVM for classification problem has significantly improved compared to S-TWSVM.


Author(s):  
Donald Douglas Atsa'am

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.


Author(s):  
Minchao Ye ◽  
Yongqiu Xu ◽  
Chenxi Ji ◽  
Hong Chen ◽  
Huijuan Lu ◽  
...  

Hyperspectral images (HSIs) have hundreds of narrow and adjacent spectral bands, which will result in feature redundancy, decreasing the classification accuracy. Feature (band) selection helps to remove the noisy or redundant features. Most traditional feature selection algorithms can be only performed on a single HSI scene. However, appearance of massive HSIs has placed a need for joint feature selection across different HSI scenes. Cross-scene feature selection is not a simple problem, since spectral shift exists between different HSI scenes, even though the scenes are captured by the same sensor. The spectral shift makes traditional single-dataset-based feature selection algorithms no longer applicable. To solve this problem, we extend the traditional ReliefF to a cross-domain version, namely, cross-domain ReliefF (CDRF). The proposed method can make full use of both source and target domains and increase the similarity of samples belonging to the same class in both domains. In the cross-scene classification problem, it is necessary to consider the class-separability of spectral features and the consistency of features between different scenes. The CDRF takes into account these two factors using a cross-domain updating rule of the feature weights. Experimental results on two cross-scene HSI datasets show the superiority of the proposed CDRF in cross-scene feature selection problems.


Author(s):  
Meng Liu ◽  
Chang Xu ◽  
Chao Xu ◽  
Dacheng Tao

Supporting vector machine (SVM) is the most frequently used classifier for machine learning tasks. However, its training time could become cumbersome when the size of training data is very large. Thus, many kinds of representative subsets are chosen from the original dataset to reduce the training complexity. In this paper, we propose to choose the representative points which are noted as anchors obtained from non-negative matrix factorization (NMF) in a divide-and-conquer framework, and then use the anchors to train an approximate SVM. Our theoretical analysis shows that the solving the DCA-SVM can yield an approximate solution close to the primal SVM. Experimental results on multiple datasets demonstrate that our DCA-SVM is faster than the state-of-the-art algorithms without notably decreasing the accuracy of classification results.


2015 ◽  
Vol 14s5 ◽  
pp. CIN.S30795 ◽  
Author(s):  
S. Sakira Hassan ◽  
Pekka Ruusuvuori ◽  
Leena Latonen ◽  
Heikki Huttunen

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ1 regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.


Sign in / Sign up

Export Citation Format

Share Document