scholarly journals Subset Selection by Pareto Optimization with Recombination

2020 ◽  
Vol 34 (03) ◽  
pp. 2408-2415
Author(s):  
Chao Qian ◽  
Chao Bian ◽  
Chao Feng

Subset selection, i.e., to select a limited number of items optimizing some given objective function, is a fundamental problem with various applications such as unsupervised feature selection and sparse regression. By employing a multi-objective evolutionary algorithm (EA) with mutation only to optimize the given objective function and minimize the number of selected items simultaneously, the recently proposed POSS algorithm achieves state-of-the-art performance for subset selection. In this paper, we propose the PORSS algorithm by incorporating recombination, a characterizing feature of EAs, into POSS. We prove that PORSS can achieve the optimal polynomial-time approximation guarantee as POSS when the objective function is monotone, and can find an optimal solution efficiently in some cases whereas POSS cannot. Extensive experiments on unsupervised feature selection and sparse regression show the superiority of PORSS over POSS. Our analysis also theoretically discloses that recombination from diverse solutions can be more likely than mutation alone to generate various variations, thereby leading to better exploration; this may be of independent interest for understanding the influence of recombination.

Author(s):  
Chao Bian ◽  
Chao Qian ◽  
Frank Neumann ◽  
Yang Yu

Subset selection with cost constraints is a fundamental problem with various applications such as influence maximization and sensor placement. The goal is to select a subset from a ground set to maximize a monotone objective function such that a monotone cost function is upper bounded by a budget. Previous algorithms with bounded approximation guarantees include the generalized greedy algorithm, POMC and EAMC, all of which can achieve the best known approximation guarantee. In real-world scenarios, the resources often vary, i.e., the budget often changes over time, requiring the algorithms to adapt the solutions quickly. However, when the budget changes dynamically, all these three algorithms either achieve arbitrarily bad approximation guarantees, or require a long running time. In this paper, we propose a new algorithm FPOMC by combining the merits of the generalized greedy algorithm and POMC. That is, FPOMC introduces a greedy selection strategy into POMC. We prove that FPOMC can maintain the best known approximation guarantee efficiently.


2014 ◽  
Vol 44 (6) ◽  
pp. 793-804 ◽  
Author(s):  
Chenping Hou ◽  
Feiping Nie ◽  
Xuelong Li ◽  
Dongyun Yi ◽  
Yi Wu

Author(s):  
Chao Qian ◽  
Chao Feng ◽  
Ke Tang

The problem of selecting a sequence of items from a universe that maximizes some given objective function arises in many real-world applications. In this paper, we propose an anytime randomized iterative approach POSeqSel, which maximizes the given objective function and minimizes the sequence length simultaneously. We prove that for any previously studied objective function, POSeqSel using a reasonable time can always reach or improve the best known approximation guarantee. Empirical results exhibit the superior performance of POSeqSel.


Author(s):  
Chao Feng ◽  
Chao Qian ◽  
Ke Tang

Dimensionality reduction is often employed to deal with the data with a huge number of features, which can be generally divided into two categories: feature transformation and feature selection. Due to the interpretability, the efficiency during inference and the abundance of unlabeled data, unsupervised feature selection has attracted much attention. In this paper, we consider its natural formulation, column subset selection (CSS), which is to minimize the reconstruction error of a data matrix by selecting a subset of features. We propose an anytime randomized iterative approach POCSS, which minimizes the reconstruction error and the number of selected features simultaneously. Its approximation guarantee is well bounded. Empirical results exhibit the superior performance of POCSS over the state-of-the-art algorithms.


2018 ◽  
Vol 10 (10) ◽  
pp. 1564 ◽  
Author(s):  
Patrick Bradley ◽  
Sina Keller ◽  
Martin Weinmann

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.


Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 359-384
Author(s):  
Manabu Ichino ◽  
Kadri Umbleja ◽  
Hiroyuki Yaguchi

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.


Author(s):  
Manabu Ichino ◽  
Kadri Umbleja ◽  
Hiroyuki Yaguchi

This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described by a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. To minimize the compactness is equivalent to maximize the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as feature effectiveness criterion. Features having small average compactness are mutually covariate, and are able to detect geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data by the visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.


Particle Swarm Optimization, a nature based stochastic evolutionary algorithm that iteratively tries to improvise the solution pertaining to a particular objective function. The problem becomes challenging if the objective function is not properly identified nor it is properly been evaluated which results in slow convergence and inability to find the optimal solution. Hence, we propose a novel rough set based particle swarm optimization algorithm using golden ratio principle for an efficient feature selection process that focusses on two objectives: First, that results in a reduced subset of features without conceding the originality of the data and the second is that yields a high optimal result. Since many subset of features might result with a meaningful solution, we have used the golden ratio principle to extract the most reduced subset with a high optimal solution. The algorithm has been tested over several benchmark datasets. The results shows that the proposed algorithm identifies a small set of features without convincing the optimal solution, thus able to achieve the stated objectives.


Sign in / Sign up

Export Citation Format

Share Document