A robust nonlinear low-dimensional manifold for single cell RNA-seq data

AbstractModern developments in single cell sequencing technologies enable broad insights into cellular state. Single cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden understanding of cell heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single cell data. However, methods have yet to be developed for unfiltered and unnormalized count data. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student’s t-distribution to estimate a manifold that is robust to technical and biological noise. We compare our approach to common dimension reduction tools to highlight our model’s ability to enable important downstream tasks, including clustering and inferring cell developmental trajectories, on available experimental data. We show that our robust nonlinear manifold is well suited for raw, unfiltered gene counts from high throughput sequencing technologies for visualization and exploration of cell states.

Download Full-text

Covariate dimension reduction for survival data via the Gaussian process latent variable model

Statistics in Medicine ◽

10.1002/sim.6784 ◽

2015 ◽

Vol 35 (8) ◽

pp. 1340-1353

Author(s):

James E. Barrett ◽

Anthony C. C. Coolen

Keyword(s):

Gaussian Process ◽

Dimension Reduction ◽

Survival Data ◽

Latent Variable ◽

Latent Variable Model ◽

Variable Model

Download Full-text

Discriminative geodesic Gaussian process latent variable model for structure preserving dimension reduction in clustering and classification problems

Neural Computing and Applications ◽

10.1007/s00521-017-3273-4 ◽

2017 ◽

Vol 31 (8) ◽

pp. 3265-3278

Author(s):

Mahdi Heidari ◽

Mohammad Hossein Moattar

Keyword(s):

Gaussian Process ◽

Dimension Reduction ◽

Latent Variable ◽

Latent Variable Model ◽

Classification Problems ◽

Variable Model ◽

Structure Preserving ◽

Clustering And Classification

Download Full-text

Shared Linear Encoder-based Gaussian Process Latent Variable Model for Visual Classification

2018 ACM Multimedia Conference on Multimedia Conference - MM '18 ◽

10.1145/3240508.3240520 ◽

2018 ◽

Cited By ~ 3

Author(s):

Jinxing Li ◽

Bob Zhang ◽

Guangming Lu ◽

David Zhang

Keyword(s):

Gaussian Process ◽

Latent Variable ◽

Latent Variable Model ◽

Variable Model ◽

Visual Classification ◽

Linear Encoder

Download Full-text

Efficient Dimensionality Reduction Methods in Reservoir History Matching

Energies ◽

10.3390/en14113137 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3137

Author(s):

Amine Tadjer ◽

Reider B. Bratvold ◽

Remus G. Hanea

Keyword(s):

Data Assimilation ◽

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variable ◽

History Matching ◽

Production Performance ◽

Latent Variable Model ◽

Variable Model ◽

Multiple Data ◽

Ensemble Smoother

Production forecasting is the basis for decision making in the oil and gas industry, and can be quite challenging, especially in terms of complex geological modeling of the subsurface. To help solve this problem, assisted history matching built on ensemble-based analysis such as the ensemble smoother and ensemble Kalman filter is useful in estimating models that preserve geological realism and have predictive capabilities. These methods tend, however, to be computationally demanding, as they require a large ensemble size for stable convergence. In this paper, we propose a novel method of uncertainty quantification and reservoir model calibration with much-reduced computation time. This approach is based on a sequential combination of nonlinear dimensionality reduction techniques: t-distributed stochastic neighbor embedding or the Gaussian process latent variable model and clustering K-means, along with the data assimilation method ensemble smoother with multiple data assimilation. The cluster analysis with t-distributed stochastic neighbor embedding and Gaussian process latent variable model is used to reduce the number of initial geostatistical realizations and select a set of optimal reservoir models that have similar production performance to the reference model. We then apply ensemble smoother with multiple data assimilation for providing reliable assimilation results. Experimental results based on the Brugge field case data verify the efficiency of the proposed approach.

Download Full-text

Harmonization Shared Autoencoder Gaussian Process Latent Variable Model With Relaxed Hamming Distance

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3026876 ◽

2020 ◽

pp. 1-15

Author(s):

Jinxing Li ◽

Bob Zhang ◽

Guangming Lu ◽

Yong Xu ◽

Feng Wu ◽

...

Keyword(s):

Gaussian Process ◽

Latent Variable ◽

Hamming Distance ◽

Latent Variable Model ◽

Variable Model

Download Full-text

Semi-supervised Gaussian process latent variable model with pairwise constraints

Neurocomputing ◽

10.1016/j.neucom.2010.01.021 ◽

2010 ◽

Vol 73 (10-12) ◽

pp. 2186-2195 ◽

Cited By ~ 19

Author(s):

Xiumei Wang ◽

Xinbo Gao ◽

Yuan Yuan ◽

Dacheng Tao ◽

Jie Li

Keyword(s):

Gaussian Process ◽

Latent Variable ◽

Latent Variable Model ◽

Variable Model ◽

Pairwise Constraints

Download Full-text

A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments

10.1101/2020.01.14.906313 ◽

2020 ◽

Author(s):

Archit Verma ◽

Barbara Engelhardt

Keyword(s):

Single Cell ◽

Latent Variable ◽

Environmental Variability ◽

Simulated Data ◽

Joint Analysis ◽

Variable Model ◽

Manifold Alignment ◽

Multiple Data Sets ◽

Sequencing Platforms ◽

Low Dimensional

Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the data onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, induced pluripotent stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq.Code and data is available at https://github.com/architverma1/sc-manifold-alignment

Download Full-text

Similarity Gaussian Process Latent Variable Model for Multi-modal Data Analysis

2015 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2015.461 ◽

2015 ◽

Cited By ~ 7

Author(s):

Guoli Song ◽

Shuhui Wang ◽

Qingming Huang ◽

Qi Tian

Keyword(s):

Data Analysis ◽

Gaussian Process ◽

Latent Variable ◽

Latent Variable Model ◽

Variable Model ◽

Modal Data

Download Full-text

A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement

10.36227/techrxiv.12375284 ◽

2020 ◽

Author(s):

Aditya Arie Nugraha ◽

Kouhei Sekiguchi ◽

Kazuyoshi Yoshii

Keyword(s):

Speech Enhancement ◽

Latent Variables ◽

Latent Variable ◽

Generative Models ◽

Latent Variable Model ◽

Variable Model ◽

Variational Autoencoder ◽

Latent Representations ◽

Low Dimensional ◽

Better Than

This paper describes a deep latent variable model of speech power spectrograms and its application to semi-supervised speech enhancement with a deep speech prior. By integrating two major deep generative models, a variational autoencoder (VAE) and a normalizing flow (NF), in a mutually-beneficial manner, we formulate a flexible latent variable model called the NF-VAE that can extract low-dimensional latent representations from high-dimensional observations, akin to the VAE, and does not need to explicitly represent the distribution of the observations, akin to the NF. In this paper, we consider a variant of NF called the generative flow (GF a.k.a. Glow) and formulate a latent variable model called the GF-VAE. We experimentally show that the proposed GF-VAE is better than the standard VAE at capturing fine-structured harmonics of speech spectrograms, especially in the high-frequency range. A similar finding is also obtained when the GF-VAE and the VAE are used to generate speech spectrograms from latent variables randomly sampled from the standard Gaussian distribution. Lastly, when these models are used as speech priors for statistical multichannel speech enhancement, the GF-VAE outperforms the VAE and the GF.

Download Full-text