scholarly journals RHDSI: A Novel Dimensionality Reduction Based Algorithm on High Dimensional Feature Selection with Interactions

Author(s):  
Rahi Jain ◽  
Wei Xu
2021 ◽  
Vol 297 ◽  
pp. 01070
Author(s):  
Ez-Zarrad Ghizlane ◽  
Sabbar Wafae ◽  
Bekkhoucha Abdelkrim

Clustering of variables is the task of grouping similar variables into different groups. It may be useful in several situations such as dimensionality reduction, feature selection, and detect redundancies. In the present study, we combine two methods of features clustering the clustering of variables around latent variables (CLV) algorithm and the k-means based co-clustering algorithm (kCC). Indeed, classical CLV cannot be applied to high dimensional data because this approach becomes tedious when the number of features increases.


2018 ◽  
Vol 10 (10) ◽  
pp. 1564 ◽  
Author(s):  
Patrick Bradley ◽  
Sina Keller ◽  
Martin Weinmann

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.


2021 ◽  
Author(s):  
Adel Mehrpooya ◽  
Farid Saberi-Movahed ◽  
Najmeh Azizizadeh ◽  
Mohammad Rezaei-Ravari ◽  
Mahdi Eftekhari ◽  
...  

The extraction of predictive features from the complex high-dimensional multi-omic data is neces- sary for decoding and overcoming the therapeutic responses in systems pharmacology. Developing computational methods to reduce high-dimensional space of features in in vitro, in vivo and clin- ical data is essential to discover the evolution and mechanisms of the drug responses and drug resistance. In this paper, we have utilized the Matrix Factorization (MF) as a modality for high dimensionality reduction in systems pharmacology. In this respect, we have proposed three novel feature selection methods using the mathematical conception of a basis. We have applied these techniques as well as three other matrix factorization methods to analyze eight different gene ex- pression datasets to investigate and compare their performance for feature selection. Our results show that these methods are capable of reducing the feature spaces and find predictive features in terms of phenotype determination. The three proposed techniques outperform the other methods used and can extract a 2-gene signature predictive of a Tyrosine Kinase Inhibitor (TKI) treatment response in the Cancer Cell Line Encyclopedia (CCLE).


Dimensionality reduction is one of the pre-processing phases required when large amount of data is available. Feature selection and Feature Extraction are one of the methods used to reduce the dimensionality. Till now these methods were using separately so the resultant feature contains original or transformed data. An efficient algorithm for Feature Selection and Extraction using Feature Subset Technique in High Dimensional Data (FSEFST) has been proposed in order to select and extract the efficient features by using feature subset method where it will have both original and transformed data. The results prove that the suggested method is better as compared with the existing algorithm


Author(s):  
Iwan Syarif

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).


Author(s):  
Miguel García-Torres ◽  
Francisco Gómez-Vela ◽  
Federico Divina ◽  
Diego P. Pinto-Roa ◽  
José Luis Vázquez Noguera ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


Sign in / Sign up

Export Citation Format

Share Document