AFS: An Attention-Based Mechanism for Supervised Feature Selection

As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature selec-tion patterns adjusted by backpropagation during the training process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also introduced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algorithms upon both MNIST, noisy MNIST and several datasets with small samples.

Download Full-text

Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge

Cancer Informatics ◽

10.4137/cin.s4020 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S4020 ◽

Cited By ~ 4

Author(s):

Chen Zhao ◽

Michael L. Bittner ◽

Robert S. Chapkin ◽

Edward R. Dougherty

Keyword(s):

Small Sample ◽

Training Data ◽

Small Samples ◽

Classification Error ◽

Biological Knowledge ◽

Feature Sets ◽

Good Feature ◽

Prior Biological Knowledge ◽

Selection Algorithms ◽

Power Curves

When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets. Availability: companion website at http://gsp.tamu.edu/Publications/supplementary/zhao09a/

Download Full-text

Feature selection may improve deep neural networks for the bioinformatics problems

Bioinformatics ◽

10.1093/bioinformatics/btz763 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zheng Chen ◽

Meng Pang ◽

Zixin Zhao ◽

Shuainan Li ◽

Rui Miao ◽

...

Keyword(s):

Neural Network ◽

Feature Selection ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Binary Classification ◽

Supplementary Information ◽

Good Prediction ◽

Programming Environment ◽

Data Types ◽

Selection Algorithms

Abstract Motivation Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms. Results A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets. Availability and implementation All the algorithms were implemented and tested under the programming environment Python version 3.6.6. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Data Augmentation and Feature Selection for Automatic Model Recommendation in Computational Physics

Mathematical and Computational Applications ◽

10.3390/mca26010017 ◽

2021 ◽

Vol 26 (1) ◽

pp. 17

Author(s):

Thomas Daniel ◽

Fabien Casenave ◽

Nissrine Akkari ◽

David Ryckelynck

Keyword(s):

Feature Selection ◽

Data Augmentation ◽

Classification Problem ◽

Computational Physics ◽

Training Data ◽

Multilayer Perceptrons ◽

Augmentation Techniques ◽

Physical Fields ◽

Classification Tasks ◽

Selection Of

Classification algorithms have recently found applications in computational physics for the selection of numerical methods or models adapted to the environment and the state of the physical system. For such classification tasks, labeled training data come from numerical simulations and generally correspond to physical fields discretized on a mesh. Three challenging difficulties arise: the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data. This article introduces two algorithms to address these issues: one for dimensionality reduction via feature selection, and one for data augmentation. These algorithms are combined with a wide variety of classifiers for their evaluation. When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.

Download Full-text

WEIGHTED STRUCTURAL SUPPORT VECTOR MACHINE

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/1/15396 ◽

2021 ◽

Vol 37 (1) ◽

pp. 43-56

Author(s):

Nguyen The Cuong ◽

Huynh The Phung

Keyword(s):

Support Vector Machine ◽

Structural Information ◽

Binary Classification ◽

Classification Problem ◽

Twin Support Vector Machine ◽

Support Vector ◽

Classification Problems ◽

Data Simulation ◽

Training Time ◽

Structural Support

In binary classification problems, two classes of data seem to be different from each other. It is expected to be more complicated due to the clusters in each class also tend to be different. Traditional algorithms as Support Vector Machine (SVM) or Twin Support Vector Machine (TWSVM) cannot sufficiently exploit structural information with cluster granularity of the data, cause limitation on the capability of simulation of data trends. Structural Twin Support Vector Machine (S-TWSVM) sufficiently exploits structural information with cluster granularity for learning a represented hyperplane. Therefore, the capability of S-TWSVM’s data simulation is better than that of TWSVM. However, for the datasets where each class consists of clusters of different trends, the S-TWSVM’s data simulation capability seems restricted. Besides, the training time of S-TWSVM has not been improved compared to TWSVM. This paper proposes a new Weighted Structural - Support Vector Machine (called WS-SVM) for binary classification problems with a class-vs-clusters strategy. Experimental results show that WS-SVM could describe the tendency of the distribution of cluster information. Furthermore, both the theory and experiment show that the training time of the WS-SVM for classification problem has significantly improved compared to S-TWSVM.

Download Full-text

Feature Selection Algorithm Using Relative Odds for Data Mining Classification

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch005 ◽

2020 ◽

pp. 81-106 ◽

Cited By ~ 3

Author(s):

Donald Douglas Atsa'am

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Initial Step ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems ◽

Odds Ratios ◽

Relative Odds ◽

Importance Ranking ◽

Selection Algorithms

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.

Download Full-text

Feature selection for cross-scene hyperspectral image classification using cross-domain ReliefF

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691319500395 ◽

2019 ◽

Vol 17 (05) ◽

pp. 1950039 ◽

Cited By ~ 1

Author(s):

Minchao Ye ◽

Yongqiu Xu ◽

Chenxi Ji ◽

Hong Chen ◽

Huijuan Lu ◽

...

Keyword(s):

Feature Selection ◽

Hyperspectral Image ◽

Classification Problem ◽

Spectral Shift ◽

Simple Problem ◽

Spectral Bands ◽

Cross Domain ◽

Two Factors ◽

Single Dataset ◽

Selection Algorithms

Hyperspectral images (HSIs) have hundreds of narrow and adjacent spectral bands, which will result in feature redundancy, decreasing the classification accuracy. Feature (band) selection helps to remove the noisy or redundant features. Most traditional feature selection algorithms can be only performed on a single HSI scene. However, appearance of massive HSIs has placed a need for joint feature selection across different HSI scenes. Cross-scene feature selection is not a simple problem, since spectral shift exists between different HSI scenes, even though the scenes are captured by the same sensor. The spectral shift makes traditional single-dataset-based feature selection algorithms no longer applicable. To solve this problem, we extend the traditional ReliefF to a cross-domain version, namely, cross-domain ReliefF (CDRF). The proposed method can make full use of both source and target domains and increase the similarity of samples belonging to the same class in both domains. In the cross-scene classification problem, it is necessary to consider the class-separability of spectral features and the consistency of features between different scenes. The CDRF takes into account these two factors using a cross-domain updating rule of the feature weights. Experimental results on two cross-scene HSI datasets show the superiority of the proposed CDRF in cross-scene feature selection problems.

Download Full-text

Fast SVM Trained by Divide-and-Conquer Anchors

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/323 ◽

2017 ◽

Cited By ~ 4

Author(s):

Meng Liu ◽

Chang Xu ◽

Chao Xu ◽

Dacheng Tao

Keyword(s):

Matrix Factorization ◽

State Of The Art ◽

Training Data ◽

Divide And Conquer ◽

Training Time ◽

Learning Tasks ◽

Multiple Datasets ◽

Original Dataset ◽

Representative Points ◽

Non Negative Matrix Factorization

Supporting vector machine (SVM) is the most frequently used classifier for machine learning tasks. However, its training time could become cumbersome when the size of training data is very large. Thus, many kinds of representative subsets are chosen from the original dataset to reduce the training complexity. In this paper, we propose to choose the representative points which are noted as anchors obtained from non-negative matrix factorization (NMF) in a divide-and-conquer framework, and then use the anchors to train an approximate SVM. Our theoretical analysis shows that the solving the DCA-SVM can yield an approximate solution close to the primal SVM. Experimental results on multiple datasets demonstrate that our DCA-SVM is faster than the state-of-the-art algorithms without notably decreasing the accuracy of classification results.

Download Full-text

An empirical study of filter-based feature selection algorithms using noisy training data

2014 4th IEEE International Conference on Information Science and Technology ◽

10.1109/icist.2014.6920367 ◽

2014 ◽

Cited By ~ 1

Author(s):

Weiwei Yuan ◽

Donghai Guan ◽

Linshan Shen ◽

Haiwei Pan

Keyword(s):

Feature Selection ◽

Empirical Study ◽

Training Data ◽

Selection Algorithms

Download Full-text

Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection

Cancer Informatics ◽

10.4137/cin.s30795 ◽

2015 ◽

Vol 14s5 ◽

pp. CIN.S30795 ◽

Cited By ~ 5

Author(s):

S. Sakira Hassan ◽

Pekka Ruusuvuori ◽

Leena Latonen ◽

Heikki Huttunen

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Cross Validation ◽

Small Sample ◽

Training Data ◽

Error Estimator ◽

Classification Models ◽

Learning Tasks ◽

Small Sample Sizes ◽

Model Selection Problem

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ1 regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.

Download Full-text

Improving binary classification of web pages using an ensemble of feature selection algorithms

Proceedings of the Australasian Computer Science Week Multiconference on - ACSW '18 ◽

10.1145/3167918.3167963 ◽

2018 ◽

Cited By ~ 2

Author(s):

Vladimir Estivill-Castro ◽

Matteo Lombardi ◽

Alessandro Marani

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Web Pages ◽

Selection Algorithms

Download Full-text