Functional Dimension Reduction in Predictive Modeling

2021 ◽  
Vol 66 (6) ◽  
pp. 745-753
Author(s):  
E. V. Burnaev ◽  
A. V. Bernstein
2017 ◽  
Vol 16 ◽  
pp. 117693511771851 ◽  
Author(s):  
Adam Kaplan ◽  
Eric F Lock

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal component analysis (PCA). However, the application of PCA is not straightforward for multisource data, wherein multiple sources of ‘omics data measure different but related biological components. In this article, we use recent advances in the dimension reduction of multisource data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multisource data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example, we consider predicting survival for patients with glioblastoma multiforme from 3 data sources measuring messenger RNA expression, microRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function jive.predict.


2019 ◽  
Author(s):  
Alexander Murray-Watters

This poster presents results from applying a new dimension reduction technique (UMAP) to a wide variety of data types, ranging from online text to social networks, for the purpose of creating useful, but anonymized, data. As the dimension reduction procedure produces meaningful distances and supports arbitrary distance measures, it can be applied to a variety of problems, and produces data that is useful for both visualization and predictive modeling. Included is a description of the dimension reduction procedure, the results of its application, and a discussion of planned future use.


Author(s):  
Sylvain Lespinats ◽  
Michaël Aupetit ◽  
Anke Meyer-Baese

Multidimensional scaling techniques are unsupervised Dimension Reduction (DR) techniques which use multidimensional data pairwise similarities to represent data into a plane enabling their visual exploratory analysis. Considering labeled data, the DR techniques face two objectives with potentially different priorities: one is to account for the data points' similarities, the other for the data classes' structures. Unsupervised DR techniques attempt to preserve original data similarities, but they do not consider their class label hence they can map originally separated classes as overlapping ones. Conversely, the state-of-the-art so-called supervised DR techniques naturally handle labeled data, but they do so in a predictive modeling framework where they attempt to separate the classes in order to improve a classification accuracy measure in the low-dimensional space, hence they can map as separated even originally overlapping classes. We propose ClassiMap, a DR technique which optimizes a new objective function enabling Exploratory Data Analysis (EDA) of labeled data. Mapping distortions known as tears and false neighborhoods cannot be avoided in general due to the reduction of the data dimension. ClassiMap intends primarily to preserve data similarities but tends to distribute preferentially unavoidable tears among the different-label data and unavoidable false neighbors among the same-label data. Standard quality measures to evaluate the quality of unsupervised mappings cannot tell about the preservation of within-class or between-class structures, while classification accuracy used to evaluate supervised mappings is only relevant to the framework of predictive modeling. We propose two measures better suited to the evaluation of DR of labeled data in an EDA framework. We use these two label-aware indices and four other standard unsupervised indices to compare ClassiMap to other state-of-the-art supervised and unsupervised DR techniques on synthetic and real datasets. ClassiMap appears to provide a better tradeoff between pairwise similarities and class structure preservation according to these new measures.


Author(s):  
P.A. Lykhin ◽  
◽  
E.V. Usov ◽  
V.N. Ulyanov ◽  
N.K. Kayurov ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document