Two-Stage Feature Selection with Unsupervised Second Stage

2018 ◽  
Vol 27 (07) ◽  
pp. 1860014
Author(s):  
Ke Xu ◽  
Crystal Maung ◽  
Hiromasa Arai ◽  
Haim Schweitzer

Feature selection is a common dimensionality reduction technique of fundamental importance in big data. A common approach for reducing the running time of feature selection is to perform it in two stages. In the first stage a fast and simple filter is applied to select good candidates. The number of candidates is further reduced in the second stage by an accurate algorithm that may run significantly slower. There are two main variants of feature selection: unsupervised and supervised. In the supervised variant features are selected for predicting labels, while the unsupervised variant does not use labels at all. We describe a general framework that can use an arbitrary off-the-shelf unsupervised algorithm for the second stage. The algorithm is applied to the selection obtained in the first stage weighted appropriately. Our main technical result is a method for calculating weights for the columns that need to be selected in the second stage. We show that these weights can be computed as the solution to a constrained quadratic optimization problem. The solution is deterministic, and improves on previously published studies that use probabilistic ideas to compute similar weights. To the best of our knowledge our approach is the first technique for converting a supervised feature selection problem into an unsupervised problem. Complexity analysis shows that the proposed technique is very fast, can be implemented in a single pass over the data, and can take advantage of data sparsity. Experimental results show that the accuracy of the proposed method is comparable to that of much slower techniques.

2003 ◽  
Vol 19 (3) ◽  
pp. 389-395
Author(s):  
Wei-Ming Pai ◽  
Dar-Zen Chen ◽  
Jyh-Jone Lee ◽  
Chi-Zer Ho

AbstractThis paper presents the design process for an innovative latch mechanism in a standard mechanical interfaced (SMIFed) wafer container, in which the manufactured integrated circuits are stored. An innovative latch mechanism is proposed and applied to the wafer container, such that the container door can be latched and air-tightly sealed during storage or transportation. The design process is divided into two stages. In the first stage, an output slot-cam is designed in order to generate decoupled fine motions of the output link. The issue is formulated as an optimization problem where the output link dimensions are optimized to minimize the resultant pin forces subject to an adequate transmission angle. In the second stage, the input slot-cam is designed to achieve that kinetic energy of the elastic gasket on the container lid is absorbed at a uniform rate. Finally, a numerical example and computer simulations are given to demonstrate the results of design process. It is believed that this work could aid in enhancing the performance and reliability of the latch mechanism in the SMIF environment.


Author(s):  
Ziyu Guan ◽  
Fei Xie ◽  
Wanqing Zhao ◽  
Xiaopeng Wang ◽  
Long Chen ◽  
...  

We are concerned with using user-tagged images to learn proper hashing functions for image retrieval. The benefits are two-fold: (1) we could obtain abundant training data for deep hashing models; (2) tagging data possesses richer semantic information which could help better characterize similarity relationships between images. However, tagging data suffers from noises, vagueness and incompleteness. Different from previous unsupervised or supervised hashing learning, we propose a novel weakly-supervised deep hashing framework which consists of two stages: weakly-supervised pre-training and supervised fine-tuning. The second stage is as usual. In the first stage, rather than performing supervision on tags, the framework introduces a semantic embedding vector (sem-vector) for each image and performs learning of hashing and sem-vectors jointly. By carefully designing the optimization problem, it can well leverage tagging information and image content for hashing learning. The framework is general and does not depend on specific deep hashing methods. Empirical results on real world datasets show that when it is integrated with state-of-art deep hashing methods, the performance increases by 8-10%.


2018 ◽  
Author(s):  
Niharika Gauraha

We would like to begin by stating that we have not fully understood the formulation of V-matrix conceptually. However, We are fascinated by the idea of estimation of conditional probability function without assuming any probabilistic model. In this short discussion, we would like to present that the proposed constrained quadratic optimization problem for conditional probability estimation using v-matrix based method may not have a consistent solution always. We are sure that the paper will stimulate a deeper exploration of V-matrix based methods for inference in high-dimensional problems in future research.


Author(s):  
Amin Ghorbanpour ◽  
Hanz Richter

Abstract In this work, simultaneous energy regeneration and motion control for robot manipulators with brushless direct current (BLDC) motors is considered. All joints of the robot are connected to regenerative drives powered from a single ultra-capacitor. A new voltage-based control method is developed to individually command each phase of the BLDC motor. Three independent regenerative drives are interconnected in a wye configuration, and each drives a phase of the motor. The objective is to determine the control inputs for each drive to minimize energy consumption from the ultra-capacitor for a given motion task. To this end, the problem is formulated as constrained quadratic optimization problem that gives the control inputs based on the desired torque generated by a virtual controller. An experimental evaluation is performed using a pendulum actuated by a BLDC motor. It is shown that the suggested control method can accomplish the motion task and it is capable of energy regeneration. The results show a reduction of about 40% in energy consumption for the condition of the study, relative to non-regenerative case.


Author(s):  
Rahul Hans ◽  
Harjot Kaur

These days, a massive quantity of data is produced online and is incorporated into a variety of datasets in the form of features, however there are lot of features in these datasets that may not be relevant to the problem. In this perspective, feature selection aids to improve the classification accuracy with lesser number of features, which can be well thought-out as an optimization problem. In this paper, Sine Cosine Algorithm (SCA) hybridized with Ant Lion Optimizer (ALO) to form a hybrid Sine Cosine Ant Lion Optimizer (SCALO) is proposed. The proposed algorithm is mapped to its binary versions by using the concept of transfer functions, with the objective to eliminate the inappropriate features and to enhance the accuracy of the classification algorithm (or in any case remains the same). For the purpose of experimentation, this research considers 18 diverse datasets and moreover, the performance of the binary versions of SCALO is compared with some of the latest metaheuristic algorithms, on the basis of various criterions. It can be observed that the binary versions of SCALO perform better than the other algorithms on various evaluation criterions for solving feature selection problem.


2021 ◽  
pp. 1-23
Author(s):  
Moussa BARRO ◽  
Satafa SANOGO ◽  
Mohamed ZONGO ◽  
Sado TRAORÉ

Robust Optimization (RO) arises in two stages of optimization, first level for maximizing over the uncertain data and second level for minimizing over the feasible set. It is the most suitable mathematical optimization procedure to solve real-life problem models. In the present work, we characterize robust solutions for both homogeneous and non-homogeneous quadratically constrained quadratic optimization problem where constraint function and cost function are uncertain. Moreover, we discuss about optimistic dual and strong robust duality of the considered uncertain quadratic optimization problem. Finally, we complete this work with an example to illustrate our solution method. Mathematics Subject Classification: (2010) 90C20 - 90C26 - 90C46-90C47 Keywords: Robust Optimization, Data Uncertainty, Quadratic Optimization Strong Duality, Robust Solution, DPJ-Convex.


Author(s):  
Dilip Kumar Choubey ◽  
Sanchita Paul ◽  
Kanchan Bala ◽  
Manish Kumar ◽  
Uday Pratap Singh

This chapter presents a best classification of diabetes. The proposed approach work consists in two stages. In the first stage the Pima Indian diabetes dataset is obtained from the UCI repository of machine learning databases. In the second stage, the authors have performed the classification technique by using fuzzy decision tree on Pima Indian diabetes dataset. Then they applied PSO_SVM as a feature selection technique followed by the classification technique by using fuzzy decision tree on Pima Indian diabetes dataset. In this chapter, the optimization of SVM using PSO reduces the number of attributes, and hence, applying fuzzy decision tree improves the accuracy of detecting diabetes. The hybrid combinatorial method of feature selection and classification needs to be done so that the system applied is used for the classification of diabetes.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yue Li ◽  
Zhiheng Sun ◽  
Xin Liu ◽  
Wei-Tung Chen ◽  
Der-Juinn Horng ◽  
...  

The feature selection problem is a fundamental issue in many research fields. In this paper, the feature selection problem is regarded as an optimization problem and addressed by utilizing a large-scale many-objective evolutionary algorithm. Considering the number of selected features, accuracy, relevance, redundancy, interclass distance, and intraclass distance, a large-scale many-objective feature selection model is constructed. It is difficult to optimize the large-scale many-objective feature selection optimization problem by using the traditional evolutionary algorithms. Therefore, this paper proposes a modified vector angle-based large-scale many-objective evolutionary algorithm (MALSMEA). The proposed algorithm uses polynomial mutation based on variable grouping instead of naive polynomial mutation to improve the efficiency of solving large-scale problems. And a novel worst-case solution replacement strategy using shift-based density estimation is used to replace the poor solution of two individuals with similar search directions to enhance convergence. The experimental results show that MALSMEA is competitive and can effectively optimize the proposed model.


Sign in / Sign up

Export Citation Format

Share Document