Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Unsupervised Nonlinear Feature Selection from High-Dimensional Signed Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5839 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4182-4189

Author(s):

Qiang Huang ◽

Tingyu Xia ◽

Huiyan Sun ◽

Makoto Yamada ◽

Yi Chang

Keyword(s):

Feature Selection ◽

Rapid Development ◽

Feature Selection Method ◽

High Dimensional ◽

Selection Methods ◽

Feature Selection Problem ◽

Unsupervised Feature Selection ◽

Signed Networks ◽

Signed Network ◽

Nonlinear Feature

With the rapid development of social media services in recent years, relational data are explosively growing. The signed network, which consists of a mixture of positive and negative links, is an effective way to represent the friendly and hostile relations among nodes, which can represent users or items. Because the features associated with a node of a signed network are usually incomplete, noisy, unlabeled, and high-dimensional, feature selection is an important procedure to eliminate irrelevant features. However, existing network-based feature selection methods are linear methods, which means they can only select features that having the linear dependency on the output values. Moreover, in many social data, most nodes are unlabeled; therefore, selecting features in an unsupervised manner is generally preferred. To this end, in this paper, we propose a nonlinear unsupervised feature selection method for signed networks, called SignedLasso. This method can select a small number of important features with nonlinear associations between inputs and output from a high-dimensional data. More specifically, we formulate unsupervised feature selection as a nonlinear feature selection problem with the Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso), which can find a small number of features in a nonlinear manner. Then, we propose the use of a deep learning-based node embedding to represent node similarity without label information and incorporate the node embedding into the HSIC Lasso. Through experiments on two real world datasets, we show that the proposed algorithm is superior to existing linear unsupervised feature selection methods.

Download Full-text

Comparison of unsupervised feature selection methods for high-dimensional regression problems in prediction of peptide binding affinity

2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) ◽

10.1109/embc.2015.7320291 ◽

2015 ◽

Cited By ~ 5

Author(s):

Ferdi Sarac ◽

Volkan Uslan ◽

Huseyin Seker ◽

Ahmed Bouridane

Keyword(s):

Feature Selection ◽

Binding Affinity ◽

Peptide Binding ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Regression ◽

Regression Problems ◽

Peptide Binding Affinity

Download Full-text

Feature Selection with Neighborhood Entropy-Based Cooperative Game Theory

Computational Intelligence and Neuroscience ◽

10.1155/2014/479289 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Kai Zeng ◽

Kun She ◽

Xinzheng Niu

Keyword(s):

Machine Learning ◽

Game Theory ◽

Feature Selection ◽

Cooperative Game ◽

Cooperative Game Theory ◽

Theory Model ◽

High Dimensional ◽

Selection Methods ◽

Evaluative Criteria ◽

High Dimensional Datasets

Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

Download Full-text

A Comparative Study on Unsupervised Feature Selection Methods for Text Clustering

2005 International Conference on Natural Language Processing and Knowledge Engineering ◽

10.1109/nlpke.2005.1598807 ◽

2006 ◽

Cited By ~ 5

Author(s):

Luying Liu ◽

Jianchu Kang ◽

Jing Yu ◽

Zhongliang Wang

Keyword(s):

Feature Selection ◽

Comparative Study ◽

Text Clustering ◽

Selection Methods ◽

Unsupervised Feature Selection

Download Full-text

Quick and robust feature selection: the strength of energy-efficient sparse training for autoencoders

Machine Learning ◽

10.1007/s10994-021-06063-x ◽

2021 ◽

Author(s):

Zahra Atashgahi ◽

Ghada Sokar ◽

Tim van der Lee ◽

Elena Mocanu ◽

Decebal Constantin Mocanu ◽

...

Keyword(s):

Feature Selection ◽

High Energy ◽

High Dimensional ◽

Training Procedure ◽

Selection Methods ◽

Speed Increase ◽

Benchmark Datasets ◽

Memory Reduction ◽

Low Dimensional ◽

High Dimensional Datasets

AbstractMajor complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem. Most of the existing feature selection methods are computationally inefficient; inefficient algorithms lead to high energy consumption, which is not desirable for devices with limited computational and energy resources. In this paper, a novel and flexible method for unsupervised feature selection is proposed. This method, named QuickSelection (The code is available at: https://github.com/zahraatashgahi/QuickSelection), introduces the strength of the neuron in sparse neural networks as a criterion to measure the feature importance. This criterion, blended with sparsely connected denoising autoencoders trained with the sparse evolutionary training procedure, derives the importance of all input features simultaneously. We implement QuickSelection in a purely sparse manner as opposed to the typical approach of using a binary mask over connections to simulate sparsity. It results in a considerable speed increase and memory reduction. When tested on several benchmark datasets, including five low-dimensional and three high-dimensional datasets, the proposed method is able to achieve the best trade-off of classification and clustering accuracy, running time, and maximum memory usage, among widely used approaches for feature selection. Besides, our proposed method requires the least amount of energy among the state-of-the-art autoencoder-based feature selection methods.

Download Full-text

Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance

Journal of Water Process Engineering ◽

10.1016/j.jwpe.2021.102033 ◽

2021 ◽

Vol 41 ◽

pp. 102033

Author(s):

Faramarz Bagherzadeh ◽

Mohamad-Javad Mehrani ◽

Milad Basirifard ◽

Javad Roostaei

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Wastewater Treatment ◽

Comparative Study ◽

Total Nitrogen ◽

Wastewater Treatment Plant ◽

Learning Algorithms ◽

Treatment Plant ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review

Ingénierie des systèmes d information ◽

10.18280/isi.260107 ◽

2021 ◽

Vol 26 (1) ◽

pp. 67-77

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Big Data ◽

Large Scale ◽

High Dimensional Data ◽

Research Work ◽

Basic Feature ◽

High Dimensional ◽

Selection Methods ◽

Fast Development ◽

Improved Accuracy

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.

Download Full-text