scholarly journals A Review of Feature Selection Techniques for Clustering High Dimensional Structured Data

2016 ◽  
Vol 6 (Special Issue) ◽  
pp. 176-179
Author(s):  
Bhagyashri A. Kelkar ◽  
Dr.S.F. Rodd
Author(s):  
VLADIMIR NIKULIN ◽  
TIAN-HSIANG HUANG ◽  
GEOFFREY J. MCLACHLAN

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.


Author(s):  
Jason Van Hulse ◽  
Taghi M. Khoshgoftaar ◽  
Amri Napolitano ◽  
Randall Wald

2018 ◽  
Vol 10 (10) ◽  
pp. 1564 ◽  
Author(s):  
Patrick Bradley ◽  
Sina Keller ◽  
Martin Weinmann

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.


2013 ◽  
Vol 34 (12) ◽  
pp. 1446-1453 ◽  
Author(s):  
Laura Maria Cannas ◽  
Nicoletta Dessì ◽  
Barbara Pes

Information ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 286
Author(s):  
Barbara Pes

Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.


Author(s):  
Bhanu Chander

High-dimensional data inspection is one of the major disputes for researchers plus engineers in domains of deep learning (DL), machine learning (ML), as well as data mining. Feature selection (FS) endows with proficient manner to determine these difficulties through eradicating unrelated and outdated data, which be capable of reducing calculation time, progress learns precision, and smooth the progress of an enhanced understanding of the learning representation or information. To eradicate an inappropriate feature, an FS standard was essential, which can determine the significance of every feature in the company of the output class/labels. Filter schemes employ variable status procedure as the standard criterion for variable collection by means of ordering. Ranking schemes utilized since their straightforwardness and high-quality accomplishment are detailed for handy appliances. The goal of this chapter is to produce complete information on FS approaches, its applications, and future research directions.


Algorithms ◽  
2022 ◽  
Vol 15 (1) ◽  
pp. 21
Author(s):  
Consolata Gakii ◽  
Paul O. Mireji ◽  
Richard Rimiru

Analysis of high-dimensional data, with more features () than observations () (), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.


Sign in / Sign up

Export Citation Format

Share Document