supervised dimension reduction
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 15)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Tara Chari ◽  
Joeyta Banerjee ◽  
Lior Pachter

Dimensionality reduction is standard practice for filtering noise and identifying relevant dimensions in large-scale data analyses. In biology, single-cell expression studies almost always begin with reduction to two or three dimensions to produce 'all-in-one' visuals of the data that are amenable to the human eye, and these are subsequently used for qualitative and quantitative analysis of cell relationships. However, there is little theoretical support for this practice. We examine the theoretical and practical implications of low-dimensional embedding of single-cell data, and find extensive distortions incurred on the global and local properties of biological patterns relative to the high-dimensional, ambient space. In lieu of this, we propose semi-supervised dimension reduction to higher dimension, and show that such targeted reduction guided by the metadata associated with single-cell experiments provides useful latent space representations for hypothesis-driven biological discovery.


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Marianne A. Messelink ◽  
Nadia M. T. Roodenrijs ◽  
Bram van Es ◽  
Cornelia A. R. Hulsbergen-Veelken ◽  
Sebastiaan Jong ◽  
...  

Abstract Background The new concept of difficult-to-treat rheumatoid arthritis (D2T RA) refers to RA patients who remain symptomatic after several lines of treatment, resulting in a high patient and economic burden. During a hackathon, we aimed to identify and predict D2T RA patients in structured and unstructured routine care data. Methods Routine care data of 1873 RA patients were extracted from the Utrecht Patient Oriented Database. Data from a previous cross-sectional study, in which 152 RA patients were clinically classified as either D2T or non-D2T, served as a validation set. Machine learning techniques, text mining, and feature importance analyses were performed to identify and predict D2T RA patients based on structured and unstructured routine care data. Results We identified 123 potentially new D2T RA patients by applying the D2T RA definition in structured and unstructured routine care data. Additionally, we developed a D2T RA identification model derived from a feature importance analysis of all available structured data (AUC-ROC 0.88 (95% CI 0.82–0.94)), and we demonstrated the potential of longitudinal hematological data to differentiate D2T from non-D2T RA patients using supervised dimension reduction. Lastly, using data up to the time of starting the first biological treatment, we predicted future development of D2TRA (AUC-ROC 0.73 (95% CI 0.71–0.75)). Conclusions During this hackathon, we have demonstrated the potential of different techniques for the identification and prediction of D2T RA patients in structured as well as unstructured routine care data. The results are promising and should be optimized and validated in future research.


Biometrika ◽  
2021 ◽  
Author(s):  
Junlong Zhao ◽  
Xiumin Liu ◽  
Hansheng Wang ◽  
Chenlei Leng

Summary A problem of major interest in network data analysis is to explain the strength of connections using context information. To achieve this, we introduce a novel approach named network-supervised dimension reduction by projecting covariates onto low-dimensional spaces for revealing the linkage pattern, without assuming a model.We propose a new loss function for estimating the parameters in the resulting linear projection, based on the notion that closer proximity in the low-dimension projection renders stronger connections. Interestingly, the convergence rate of our estimator is shown to depend on a network effect factor which is the smallest number that can partition a graph in a way similar to the graph coloring problem. Our methodology has interesting connections to principal component analysis and linear discriminant analysis, which we exploit for clustering and community detection. The methodology developed is further illustrated by numerical experiments and the analysis of a pulsar candidates data in astronomy.


Author(s):  
Yichen Cheng ◽  
Xinlei Wang ◽  
Yusen Xia

We propose a novel supervised dimension-reduction method called supervised t-distributed stochastic neighbor embedding (St-SNE) that achieves dimension reduction by preserving the similarities of data points in both feature and outcome spaces. The proposed method can be used for both prediction and visualization tasks with the ability to handle high-dimensional data. We show through a variety of data sets that when compared with a comprehensive list of existing methods, St-SNE has superior prediction performance in the ultrahigh-dimensional setting in which the number of features p exceeds the sample size n and has competitive performance in the p ≤ n setting. We also show that St-SNE is a competitive visualization tool that is capable of capturing within-cluster variations. In addition, we propose a penalized Kullback–Leibler divergence criterion to automatically select the reduced-dimension size k for St-SNE. Summary of Contribution: With the fast development of data collection and data processing technologies, high-dimensional data have now become ubiquitous. Examples of such data include those collected from environmental sensors, personal mobile devices, and wearable electronics. High-dimensionality poses great challenges for data analytics routines, both methodologically and computationally. Many machine learning algorithms may fail to work for ultrahigh-dimensional data, where the number of the features p is (much) larger than the sample size n. We propose a novel method for dimension reduction that can (i) aid the understanding of high-dimensional data through visualization and (ii) create a small set of good predictors, which is especially useful for prediction using ultrahigh-dimensional data.


2019 ◽  
Vol 13 (4) ◽  
pp. 334-347
Author(s):  
Liyan Zhao ◽  
Huan Wang ◽  
Jing Wang

Background: Subspace learning-based dimensionality reduction algorithms are important and have been popularly applied in data mining, pattern recognition and computer vision applications. They show the successful dimension reduction when data points are evenly distributed in the high-dimensional space. However, some may distort the local geometric structure of the original dataset and result in a poor low-dimensional embedding while data samples show an uneven distribution in the original space. Methods: In this paper, we propose a supervised dimension reduction method by local neighborhood optimization to disposal the uneven distribution of high-dimensional data. It extends the widely used Locally Linear Embedding (LLE) framework, namely LNOLLE. The method considers the class label of the data to optimize local neighborhood, which achieves better separability inter-class distance of the data in the low-dimensional space with the aim to abstain holding together the data samples of different classes while mapping an uneven distributed data. This effectively preserves the geometric topological structure of the original data points. Results: We use the presented LNOLLE method to the image classification and face recognition, which achieves a good classification result and higher face recognition accuracy compared with existing manifold learning methods including popular supervised algorithms. In addition, we consider the reconstruction of the method to solve noise suppression for seismic image. To the best of our knowledge, this is the first manifold learning approach to solve high-dimensional nonlinear seismic data for noise suppression. Conclusion: The experimental results on forward model and real seismic data show that LNOLLE improves signal to noise ratio of seismic image compared with the widely used Singular Value Decomposition (SVD) filtering method.


2019 ◽  
Vol 11 (24) ◽  
pp. 2892 ◽  
Author(s):  
Ying-Nong Chen

In this study, a novel multple kernel FLE (MKFLE) based on general nearest feature line embedding (FLE) transformation is proposed and applied to classify hyperspectral image (HSI) in which the advantage of multple kernel learning is considered. The FLE has successfully shown its discriminative capability in many applications. However, since the conventional linear-based principle component analysis (PCA) pre-processing method in FLE cannot effectively extract the nonlinear information, the multiple kernel PCA (MKPCA) based on the proposed multple kernel method was proposed to alleviate this problem. The proposed MKFLE dimension reduction framework was performed through two stages. In the first multple kernel PCA (MKPCA) stage, the multple kernel learning method based on between-class distance and support vector machine (SVM) was used to find the kernel weights. Based on these weights, a new weighted kernel function was constructed in a linear combination of some valid kernels. In the second FLE stage, the FLE method, which can preserve the nonlinear manifold structure, was applied for supervised dimension reduction using the kernel obtained in the first stage. The effectiveness of the proposed MKFLE algorithm was measured by comparing with various previous state-of-the-art works on three benchmark data sets. According to the experimental results: the performance of the proposed MKFLE is better than the other methods, and got the accuracy of 83.58%, 91.61%, and 97.68% in Indian Pines, Pavia University, and Pavia City datasets, respectively.


Sign in / Sign up

Export Citation Format

Share Document