scholarly journals Recent Advances in Supervised Dimension Reduction: A Survey

2019 ◽  
Vol 1 (1) ◽  
pp. 341-358 ◽  
Author(s):  
Guoqing Chao ◽  
Yuan Luo ◽  
Weiping Ding

Recently, we have witnessed an explosive growth in both the quantity and dimension of data generated, which aggravates the high dimensionality challenge in tasks such as predictive modeling and decision support. Up to now, a large amount of unsupervised dimension reduction methods have been proposed and studied. However, there is no specific review focusing on the supervised dimension reduction problem. Most studies performed classification or regression after unsupervised dimension reduction methods. However, we recognize the following advantages if learning the low-dimensional representation and the classification/regression model simultaneously: high accuracy and effective representation. Considering classification or regression as being the main goal of dimension reduction, the purpose of this paper is to summarize and organize the current developments in the field into three main classes: PCA-based, Non-negative Matrix Factorization (NMF)-based, and manifold-based supervised dimension reduction methods, as well as provide elaborated discussions on their advantages and disadvantages. Moreover, we outline a dozen open problems that can be further explored to advance the development of this topic.

2002 ◽  
Vol 14 (1) ◽  
pp. 191-215 ◽  
Author(s):  
Nikos Vlassis ◽  
Yoichi Motomura ◽  
Ben Kröse

High-dimensional data generated by a system with limited degrees of freedom are often constrained in low-dimensional manifolds in the original space. In this article, we investigate dimension-reduction methods for such intrinsically low-dimensional data through linear projections that preserve the manifold structure of the data. For intrinsically one-dimensional data, this implies projecting to a curve on the plane with as few intersections as possible. We are proposing a supervised projection pursuit method that can be regarded as an extension of the single-index model for nonparametric regression. We show results from a toy and two robotic applications.


BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Mona Rams ◽  
Tim O.F. Conrad

Abstract Background Pseudotime estimation from dynamic single-cell transcriptomic data enables characterisation and understanding of the underlying processes, for example developmental processes. Various pseudotime estimation methods have been proposed during the last years. Typically, these methods start with a dimension reduction step because the low-dimensional representation is usually easier to analyse. Approaches such as PCA, ICA or t-SNE belong to the most widely used methods for dimension reduction in pseudotime estimation methods. However, these methods usually make assumptions on the derived dimensions, which can result in important dataset properties being missed. In this paper, we suggest a new dictionary learning based approach, dynDLT, for dimension reduction and pseudotime estimation of dynamic transcriptomic data. Dictionary learning is a matrix factorisation approach that does not restrict the dependence of the derived dimensions. To evaluate the performance, we conduct a large simulation study and analyse 8 real-world datasets. Results The simulation studies reveal that firstly, dynDLT preserves the simulated patterns in low-dimension and the pseudotimes can be derived from the low-dimensional representation. Secondly, the results show that dynDLT is suitable for the detection of genes exhibiting the simulated dynamic patterns, thereby facilitating the interpretation of the compressed representation and thus the dynamic processes. For the real-world data analysis, we select datasets with samples that are taken at different time points throughout an experiment. The pseudotimes found by dynDLT have high correlations with the experimental times. We compare the results to other approaches used in pseudotime estimation, or those that are method-wise closely connected to dictionary learning: ICA, NMF, PCA, t-SNE, and UMAP. DynDLT has the best overall performance for the simulated and real-world datasets. Conclusions We introduce dynDLT, a method that is suitable for pseudotime estimation. Its main advantages are: (1) It presents a model-free approach, meaning that it does not restrict the dependence of the derived dimensions; (2) Genes that are relevant in the detected dynamic processes can be identified from the dictionary matrix; (3) By a restriction of the dictionary entries to positive values, the dictionary atoms are highly interpretable.


Author(s):  
Yashita Jain ◽  
Shanshan Ding ◽  
Jing Qiu

Abstract Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods.


2015 ◽  
Vol 731 ◽  
pp. 120-123
Author(s):  
Song Hua He ◽  
Qiao Chen ◽  
Gang Zhang ◽  
Jiang Duan

Two new metameric black spectral dimension reduction methods based on color difference optimization are presented, and dimension reduction effects are compared in colorimetric and spectral accuracy. The method one decomposes firstly the original spectrum into the basic spectrum and the metameric black spectrum using R-matrix theory, and then determines respectively the basis vectors which express linearly the basic spectrum and the metameric black spectrum. The method two applies firstly the principal component method to the original spectrum to get the first three eigenvectors as basis vectors of the basic spectrum, and then calculates the fundamental spectrum using tristimulus values and basis vectors of original spectrum. Results of experiment show the low-dimensional linear model built by method two can improve spectral and colorimetric accuracy, and satisfy the requirement of spectral color reproduction.


2019 ◽  
Vol 13 (4) ◽  
pp. 334-347
Author(s):  
Liyan Zhao ◽  
Huan Wang ◽  
Jing Wang

Background: Subspace learning-based dimensionality reduction algorithms are important and have been popularly applied in data mining, pattern recognition and computer vision applications. They show the successful dimension reduction when data points are evenly distributed in the high-dimensional space. However, some may distort the local geometric structure of the original dataset and result in a poor low-dimensional embedding while data samples show an uneven distribution in the original space. Methods: In this paper, we propose a supervised dimension reduction method by local neighborhood optimization to disposal the uneven distribution of high-dimensional data. It extends the widely used Locally Linear Embedding (LLE) framework, namely LNOLLE. The method considers the class label of the data to optimize local neighborhood, which achieves better separability inter-class distance of the data in the low-dimensional space with the aim to abstain holding together the data samples of different classes while mapping an uneven distributed data. This effectively preserves the geometric topological structure of the original data points. Results: We use the presented LNOLLE method to the image classification and face recognition, which achieves a good classification result and higher face recognition accuracy compared with existing manifold learning methods including popular supervised algorithms. In addition, we consider the reconstruction of the method to solve noise suppression for seismic image. To the best of our knowledge, this is the first manifold learning approach to solve high-dimensional nonlinear seismic data for noise suppression. Conclusion: The experimental results on forward model and real seismic data show that LNOLLE improves signal to noise ratio of seismic image compared with the widely used Singular Value Decomposition (SVD) filtering method.


Biometrika ◽  
2021 ◽  
Author(s):  
Junlong Zhao ◽  
Xiumin Liu ◽  
Hansheng Wang ◽  
Chenlei Leng

Summary A problem of major interest in network data analysis is to explain the strength of connections using context information. To achieve this, we introduce a novel approach named network-supervised dimension reduction by projecting covariates onto low-dimensional spaces for revealing the linkage pattern, without assuming a model.We propose a new loss function for estimating the parameters in the resulting linear projection, based on the notion that closer proximity in the low-dimension projection renders stronger connections. Interestingly, the convergence rate of our estimator is shown to depend on a network effect factor which is the smallest number that can partition a graph in a way similar to the graph coloring problem. Our methodology has interesting connections to principal component analysis and linear discriminant analysis, which we exploit for clustering and community detection. The methodology developed is further illustrated by numerical experiments and the analysis of a pulsar candidates data in astronomy.


2021 ◽  
Vol 25 (2) ◽  
pp. 339-357
Author(s):  
Guowang Du ◽  
Lihua Zhou ◽  
Kevin Lü ◽  
Haiyan Ding

Multi-view clustering aims to group similar samples into the same clusters and dissimilar samples into different clusters by integrating heterogeneous information from multi-view data. Non-negative matrix factorization (NMF) has been widely applied to multi-view clustering owing to its interpretability. However, most NMF-based algorithms only factorize multi-view data based on the shallow structure, neglecting complex hierarchical and heterogeneous information in multi-view data. In this paper, we propose a deep multiple non-negative matrix factorization (DMNMF) framework based on AutoEncoder for multi-view clustering. DMNMF consists of multiple Encoder Components and Decoder Components with deep structures. Each pair of Encoder Component and Decoder Component are used to hierarchically factorize the input data from a view for capturing the hierarchical information, and all Encoder and Decoder Components are integrated into an abstract level to learn a common low-dimensional representation for combining the heterogeneous information across multi-view data. Furthermore, graph regularizers are also introduced to preserve the local geometric information of each view. To optimize the proposed framework, an iterative updating scheme is developed. Besides, the corresponding algorithm called MVC-DMNMF is also proposed and implemented. Extensive experiments on six benchmark datasets have been conducted, and the experimental results demonstrate the superior performance of our proposed MVC-DMNMF for multi-view clustering compared to other baseline algorithms.


Sign in / Sign up

Export Citation Format

Share Document