scholarly journals Unsupervised Feature Selection by Pareto Optimization

Author(s):  
Chao Feng ◽  
Chao Qian ◽  
Ke Tang

Dimensionality reduction is often employed to deal with the data with a huge number of features, which can be generally divided into two categories: feature transformation and feature selection. Due to the interpretability, the efficiency during inference and the abundance of unlabeled data, unsupervised feature selection has attracted much attention. In this paper, we consider its natural formulation, column subset selection (CSS), which is to minimize the reconstruction error of a data matrix by selecting a subset of features. We propose an anytime randomized iterative approach POCSS, which minimizes the reconstruction error and the number of selected features simultaneously. Its approximation guarantee is well bounded. Empirical results exhibit the superior performance of POCSS over the state-of-the-art algorithms.

2020 ◽  
Vol 34 (03) ◽  
pp. 2408-2415
Author(s):  
Chao Qian ◽  
Chao Bian ◽  
Chao Feng

Subset selection, i.e., to select a limited number of items optimizing some given objective function, is a fundamental problem with various applications such as unsupervised feature selection and sparse regression. By employing a multi-objective evolutionary algorithm (EA) with mutation only to optimize the given objective function and minimize the number of selected items simultaneously, the recently proposed POSS algorithm achieves state-of-the-art performance for subset selection. In this paper, we propose the PORSS algorithm by incorporating recombination, a characterizing feature of EAs, into POSS. We prove that PORSS can achieve the optimal polynomial-time approximation guarantee as POSS when the objective function is monotone, and can find an optimal solution efficiently in some cases whereas POSS cannot. Extensive experiments on unsupervised feature selection and sparse regression show the superiority of PORSS over POSS. Our analysis also theoretically discloses that recombination from diverse solutions can be more likely than mutation alone to generate various variations, thereby leading to better exploration; this may be of independent interest for understanding the influence of recombination.


2019 ◽  
Vol 31 (3) ◽  
pp. 517-537 ◽  
Author(s):  
Haojie Hu ◽  
Rong Wang ◽  
Xiaojun Yang ◽  
Feiping Nie

Recently, graph-based unsupervised feature selection algorithms (GUFS) have been shown to efficiently handle prevalent high-dimensional unlabeled data. One common drawback associated with existing graph-based approaches is that they tend to be time-consuming and in need of large storage, especially when faced with the increasing size of data. Research has started using anchors to accelerate graph-based learning model for feature selection, while the hard linear constraint between the data matrix and the lower-dimensional representation is usually overstrict in many applications. In this letter, we propose a flexible linearization model with anchor graph and [Formula: see text]-norm regularization, which can deal with large-scale data sets and improve the performance of the existing anchor-based method. In addition, the anchor-based graph Laplacian is constructed to characterize the manifold embedding structure by means of a parameter-free adaptive neighbor assignment strategy. An efficient iterative algorithm is developed to address the optimization problem, and we also prove the convergence of the algorithm. Experiments on several public data sets demonstrate the effectiveness and efficiency of the method we propose.


Author(s):  
Jundong Li ◽  
Jiliang Tang ◽  
Huan Liu

Feature selection has been proven to be effective and efficient in preparing high-dimensional data for data mining and machine learning problems. Since real-world data is usually unlabeled, unsupervised feature selection has received increasing attention in recent years. Without label information, unsupervised feature selection needs alternative criteria to define feature relevance. Recently, data reconstruction error emerged as a new criterion for unsupervised feature selection, which defines feature relevance as the capability of features to approximate original data via a reconstruction function. Most existing algorithms in this family assume predefined, linear reconstruction functions. However, the reconstruction function should be data dependent and may not always be linear especially when the original data is high-dimensional. In this paper, we investigate how to learn the reconstruction function from the data automatically for unsupervised feature selection, and propose a novel reconstruction-based unsupervised feature selection framework REFS, which embeds the reconstruction function learning process into feature selection. Experiments on various types of real-world datasets demonstrate the effectiveness of the proposed framework REFS.


Author(s):  
Chang Tang ◽  
Xiao Zheng ◽  
Xinwang Liu ◽  
Wei Zhang ◽  
Jing Zhang ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3627
Author(s):  
Bo Jin ◽  
Chunling Fu ◽  
Yong Jin ◽  
Wei Yang ◽  
Shengbin Li ◽  
...  

Identifying the key genes related to tumors from gene expression data with a large number of features is important for the accurate classification of tumors and to make special treatment decisions. In recent years, unsupervised feature selection algorithms have attracted considerable attention in the field of gene selection as they can find the most discriminating subsets of genes, namely the potential information in biological data. Recent research also shows that maintaining the important structure of data is necessary for gene selection. However, most current feature selection methods merely capture the local structure of the original data while ignoring the importance of the global structure of the original data. We believe that the global structure and local structure of the original data are equally important, and so the selected genes should maintain the essential structure of the original data as far as possible. In this paper, we propose a new, adaptive, unsupervised feature selection scheme which not only reconstructs high-dimensional data into a low-dimensional space with the constraint of feature distance invariance but also employs ℓ2,1-norm to enable a matrix with the ability to perform gene selection embedding into the local manifold structure-learning framework. Moreover, an effective algorithm is developed to solve the optimization problem based on the proposed scheme. Comparative experiments with some classical schemes on real tumor datasets demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document