Unsupervised Feature Selection Based on Ultrametricity and Sparse Training Data: A Case Study for the Classification of High-Dimensional Hyperspectral Data

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.

Download Full-text

CLASSIFICATION OF HIGH-DIMENSIONAL MICROARRAY DATA WITH A TWO-STEP PROCEDURE VIA A WILCOXON CRITERION AND MULTILAYER PERCEPTRON

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026811002969 ◽

2011 ◽

Vol 10 (01) ◽

pp. 1-14

Author(s):

VLADIMIR NIKULIN ◽

TIAN-HSIANG HUANG ◽

GEOFFREY J. MCLACHLAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

High Dimensional ◽

Second Step ◽

Support Vector ◽

Step Procedure ◽

Leave One Out ◽

Natural Combination ◽

Feature Selection Techniques

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

Unsupervised Feature Selection Using Recursive k-Means Silhouette Elimination (RkSE): A Two-Scenario Case Study for Fault Classification of High-Dimensional Sensor Data

10.20944/preprints202008.0254.v1 ◽

2020 ◽

Author(s):

Ahlam Mallak ◽

Madjid Fathi

Keyword(s):

Feature Selection ◽

Multivariate Time Series ◽

Sliding Window ◽

Classification Problem ◽

Sensor Data ◽

Fault Classification ◽

Hydraulic Test ◽

Unsupervised Feature Selection

Feature selection is a crucial step to overcome the curse of dimensionality problem in data mining. This work proposes Recursive k-means Silhouette Elimination (RkSE) as a new unsupervised feature selection algorithm to reduce dimensionality in univariate and multivariate time-series datasets. Where k-means clustering is applied recursively to select the cluster representative features, following a unique application of silhouette measure for each cluster and a user-defined threshold as the feature selection or elimination criteria. The proposed method is evaluated on a hydraulic test rig, multi sensor readings in two different fashions: (1) Reduce the dimensionality in a multivariate classification problem using various classifiers of different functionalities. (2) Classification of univariate data in a sliding window scenario, where RkSE is used as a window compression method, to reduce the window dimensionality by selecting the best time points in a sliding window. Moreover, the results are validated using 10-fold cross validation technique. As well as, compared to the results when the classification is pulled directly with no feature selection applied. Additionally, a new taxonomy for k-means based feature selection methods is proposed. The experimental results and observations in the two comprehensive experiments demonstrated in this work reveal the capabilities and accuracy of the proposed method.

Download Full-text

RHDSI: A Novel Dimensionality Reduction Based Algorithm on High Dimensional Feature Selection with Interactions

Information Sciences ◽

10.1016/j.ins.2021.06.096 ◽

2021 ◽

Author(s):

Rahi Jain ◽

Wei Xu

Keyword(s):

Feature Selection ◽

Dimensionality Reduction ◽

High Dimensional

Download Full-text

Using class-based feature selection for the classification of hyperspectral data

International Journal of Remote Sensing ◽

10.1080/01431161.2010.486416 ◽

2011 ◽

Vol 32 (15) ◽

pp. 4311-4326 ◽

Cited By ~ 19

Author(s):

Yasser Maghsoudi ◽

Mohammad Javad Valadan Zoej ◽

Michael Collins

Keyword(s):

Feature Selection ◽

Hyperspectral Data ◽

Selection For

Download Full-text

Classification of tree species based on longwave hyperspectral data from leaves, a case study for a tropical dry forest

International Journal of Applied Earth Observation and Geoinformation ◽

10.1016/j.jag.2017.11.009 ◽

2018 ◽

Vol 66 ◽

pp. 93-105 ◽

Cited By ~ 23

Author(s):

D. Harrison ◽

B. Rivard ◽

A. Sánchez-Azofeifa

Keyword(s):

Tree Species ◽

Tropical Dry Forest ◽

Hyperspectral Data ◽

Dry Forest

Download Full-text

An efficient approach for dimensionality reduction and classification of high dimensional text documents

Proceedings of the First International Conference on Data Science, E-learning and Information Systems - DATA '18 ◽

10.1145/3279996.3281364 ◽

2018 ◽

Cited By ~ 2

Author(s):

Kotte Vinay Kumar ◽

R. Srinivasan ◽

E. B. Singh

Keyword(s):

Dimensionality Reduction ◽

High Dimensional ◽

Text Documents ◽

Efficient Approach

Download Full-text

Feature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach

Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37189-9_5 ◽

2013 ◽

pp. 43-55 ◽

Cited By ~ 15

Author(s):

Soha Ahmed ◽

Mengjie Zhang ◽

Lifeng Peng

Keyword(s):

Mass Spectrometry ◽

Feature Selection ◽

Genetic Programming ◽

Mass Spectrometry Data ◽

High Dimensional ◽

Programming Approach

Download Full-text

Zero-Shot Feature Selection via Transferring Supervised Knowledge

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2021040101 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-20

Author(s):

Zheng Wang ◽

Qiao Wang ◽

Tingzhang Zhao ◽

Chaokun Wang ◽

Xiaojun Ye

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Real World ◽

Rapid Growth ◽

Learning Systems ◽

Training Data ◽

Effective Technique ◽

Supervised Methods ◽

Real World Datasets

Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging concepts, existing supervised methods might easily suffer from the scarcity and validity of labeled data for training. In this paper, the authors study the problem of zero-shot feature selection (i.e., building a feature selection model that generalizes well to “unseen” concepts with limited training data of “seen” concepts). Specifically, they adopt class-semantic descriptions (i.e., attributes) as supervision for feature selection, so as to utilize the supervised knowledge transferred from the seen concepts. For more reliable discriminative features, they further propose the center-characteristic loss which encourages the selected features to capture the central characteristics of seen concepts. Extensive experiments conducted on various real-world datasets demonstrate the effectiveness of the method.

Download Full-text

Hybrid Ensemble Learning Methods for Classification of Microarray Data

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch038 ◽

2020 ◽

pp. 707-725

Author(s):

Sujata Dash

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Microarray Data ◽

Classification Model ◽

Rotation Forest ◽

Ensemble Technique ◽

Basic Characteristics ◽

Microarray Datasets ◽

Feature Selection Techniques

Efficient classification and feature extraction techniques pave an effective way for diagnosing cancers from microarray datasets. It has been observed that the conventional classification techniques have major limitations in discriminating the genes accurately. However, such kind of problems can be addressed by an ensemble technique to a great extent. In this paper, a hybrid RotBagg ensemble framework has been proposed to address the problem specified above. This technique is an integration of Rotation Forest and Bagging ensemble which in turn preserves the basic characteristics of ensemble architecture i.e., diversity and accuracy. Three different feature selection techniques are employed to select subsets of genes to improve the effectiveness and generalization of the RotBagg ensemble. The efficiency is validated through five microarray datasets and also compared with the results of base learners. The experimental results show that the correlation based FRFR with PCA-based RotBagg ensemble form a highly efficient classification model.

Download Full-text

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_16 ◽

2018 ◽

pp. 205-218

Author(s):

Saúl Solorio-Fernández ◽

J. Ariel Carrasco-Ochoa ◽

José Fco. Martínez-Trinidad

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Datasets

Download Full-text