Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders

Machine Learning ◽

10.1007/s10994-021-06063-x ◽

2021 ◽

Author(s):

Zahra Atashgahi ◽

Ghada Sokar ◽

Tim van der Lee ◽

Elena Mocanu ◽

Decebal Constantin Mocanu ◽

...

Keyword(s):

Feature Selection ◽

High Energy ◽

High Dimensional ◽

Training Procedure ◽

Selection Methods ◽

Speed Increase ◽

Benchmark Datasets ◽

Memory Reduction ◽

Low Dimensional ◽

High Dimensional Datasets

AbstractMajor complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem. Most of the existing feature selection methods are computationally inefficient; inefficient algorithms lead to high energy consumption, which is not desirable for devices with limited computational and energy resources. In this paper, a novel and flexible method for unsupervised feature selection is proposed. This method, named QuickSelection (The code is available at: https://github.com/zahraatashgahi/QuickSelection), introduces the strength of the neuron in sparse neural networks as a criterion to measure the feature importance. This criterion, blended with sparsely connected denoising autoencoders trained with the sparse evolutionary training procedure, derives the importance of all input features simultaneously. We implement QuickSelection in a purely sparse manner as opposed to the typical approach of using a binary mask over connections to simulate sparsity. It results in a considerable speed increase and memory reduction. When tested on several benchmark datasets, including five low-dimensional and three high-dimensional datasets, the proposed method is able to achieve the best trade-off of classification and clustering accuracy, running time, and maximum memory usage, among widely used approaches for feature selection. Besides, our proposed method requires the least amount of energy among the state-of-the-art autoencoder-based feature selection methods.

Download Full-text

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_16 ◽

2018 ◽

pp. 205-218

Author(s):

Saúl Solorio-Fernández ◽

J. Ariel Carrasco-Ochoa ◽

José Fco. Martínez-Trinidad

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Datasets

Download Full-text

Feature Selection with Neighborhood Entropy-Based Cooperative Game Theory

Computational Intelligence and Neuroscience ◽

10.1155/2014/479289 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Kai Zeng ◽

Kun She ◽

Xinzheng Niu

Keyword(s):

Machine Learning ◽

Game Theory ◽

Feature Selection ◽

Cooperative Game ◽

Cooperative Game Theory ◽

Theory Model ◽

High Dimensional ◽

Selection Methods ◽

Evaluative Criteria ◽

High Dimensional Datasets

Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

Download Full-text

Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review

Ingénierie des systèmes d information ◽

10.18280/isi.260107 ◽

2021 ◽

Vol 26 (1) ◽

pp. 67-77

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Big Data ◽

Large Scale ◽

High Dimensional Data ◽

Research Work ◽

Basic Feature ◽

High Dimensional ◽

Selection Methods ◽

Fast Development ◽

Improved Accuracy

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.

Download Full-text

BETTER ALTERNATIVES FOR STEPWISE DISCRIMINANT ANALYSIS

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.311.02 ◽

2015 ◽

Vol 1 (311) ◽

Author(s):

Katarzyna Stąpor

Keyword(s):

Feature Selection ◽

Discriminant Analysis ◽

Tabu Search ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Discrimination Power ◽

Statistical Software ◽

Software Packages ◽

Benchmark Datasets

Discriminant Analysis can best be defined as a technique which allows the classification of an individual into several dictinctive populations on the basis of a set of measurements. Stepwise discriminant analysis (SDA) is concerned with selecting the most important variables whilst retaining the highest discrimination power possible. The process of selecting a smaller number of variables is often necessary for a variety number of reasons. In the existing statistical software packages SDA is based on the classic feature selection methods. Many problems with such stepwise procedures have been identified. In this work the new method based on the metaheuristic strategy tabu search will be presented together with the experimental results conducted on the selected benchmark datasets. The results are promising.

Download Full-text

Hybrid feature selection methods for high-dimensional multi-class datasets

International Journal of Data Mining Modelling and Management ◽

10.1504/ijdmmm.2017.088411 ◽

2017 ◽

Vol 9 (4) ◽

pp. 315

Author(s):

Amit Kumar Saxena ◽

Vimal Kumar Dubey ◽

John Wang

Keyword(s):

Feature Selection ◽

High Dimensional ◽

Selection Methods

Download Full-text

Nested and Repeated Cross Validation for Classification Model With High-Dimensional Data

Revista Colombiana de Estadística ◽

10.15446/rce.v43n1.80000 ◽

2020 ◽

Vol 43 (1) ◽

pp. 103-125

Author(s):

Yi Zhong ◽

Jianghua He ◽

Prabhakar Chalise

Keyword(s):

Feature Selection ◽

Cross Validation ◽

Predictive Accuracy ◽

Simulated Data ◽

Classification Model ◽

High Dimensional ◽

Clinical Settings ◽

Feature Subset ◽

Validation Method ◽

High Dimensional Datasets

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.

Download Full-text

Overlapping tiling for fast random access of low-dimensional data from high-dimensional datasets

10.1117/12.805419 ◽

2009 ◽

Cited By ~ 1

Author(s):

Zihong Fan ◽

Antonio Ortega

Keyword(s):

Random Access ◽

High Dimensional ◽

Low Dimensional ◽

High Dimensional Datasets

Download Full-text

A Novel Algorithm for Clustering and Feature Selection of High Dimensional Datasets

Advances in Modelling and Analysis B ◽

10.18280/ama_b.600301 ◽

2017 ◽

Vol 60 (3) ◽

pp. 525-538

Author(s):

Thulasi Bikku ◽

Alapati Priya

Keyword(s):

Feature Selection ◽

High Dimensional ◽

High Dimensional Datasets ◽

Selection Of ◽

Novel Algorithm

Download Full-text

On the scalability of feature selection methods on high-dimensional data

Knowledge and Information Systems ◽

10.1007/s10115-017-1140-3 ◽

2017 ◽

Vol 56 (2) ◽

pp. 395-442 ◽

Cited By ~ 11

Author(s):

V. Bolón-Canedo ◽

D. Rego-Fernández ◽

D. Peteiro-Barral ◽

A. Alonso-Betanzos ◽

B. Guijarro-Berdiñas ◽

...

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Methods

Download Full-text

Robust Feature Selection on Incomplete Data

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/443 ◽

2018 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaofeng Zhu ◽

Yonghua Zhu ◽

Shichao Zhang

Keyword(s):

Feature Selection ◽

Incomplete Data ◽

High Dimensional ◽

Data Sets ◽

Selection Methods ◽

Limited Ability ◽

Training Samples ◽

Indicator Matrix ◽

Selection Framework ◽

Incomplete Datasets

Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis,but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) andhave limited ability to deal with incomplete datasets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selectionframework to relieve the influence of outliers, andthen introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conductfeature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well asprove our algorithm to converge fast. Experimental results on both real and artificial incompletedata sets demonstrated that our proposed methodoutperformed the feature selection methods undercomparison in terms of clustering performance.

Download Full-text