scholarly journals A robust nonlinear low-dimensional manifold for single cell RNA-seq data

2018 ◽  
Author(s):  
Archit Verma ◽  
Barbara E. Engelhardt

AbstractModern developments in single cell sequencing technologies enable broad insights into cellular state. Single cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden understanding of cell heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single cell data. However, methods have yet to be developed for unfiltered and unnormalized count data. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student’s t-distribution to estimate a manifold that is robust to technical and biological noise. We compare our approach to common dimension reduction tools to highlight our model’s ability to enable important downstream tasks, including clustering and inferring cell developmental trajectories, on available experimental data. We show that our robust nonlinear manifold is well suited for raw, unfiltered gene counts from high throughput sequencing technologies for visualization and exploration of cell states.

Energies ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 3137
Author(s):  
Amine Tadjer ◽  
Reider B. Bratvold ◽  
Remus G. Hanea

Production forecasting is the basis for decision making in the oil and gas industry, and can be quite challenging, especially in terms of complex geological modeling of the subsurface. To help solve this problem, assisted history matching built on ensemble-based analysis such as the ensemble smoother and ensemble Kalman filter is useful in estimating models that preserve geological realism and have predictive capabilities. These methods tend, however, to be computationally demanding, as they require a large ensemble size for stable convergence. In this paper, we propose a novel method of uncertainty quantification and reservoir model calibration with much-reduced computation time. This approach is based on a sequential combination of nonlinear dimensionality reduction techniques: t-distributed stochastic neighbor embedding or the Gaussian process latent variable model and clustering K-means, along with the data assimilation method ensemble smoother with multiple data assimilation. The cluster analysis with t-distributed stochastic neighbor embedding and Gaussian process latent variable model is used to reduce the number of initial geostatistical realizations and select a set of optimal reservoir models that have similar production performance to the reference model. We then apply ensemble smoother with multiple data assimilation for providing reliable assimilation results. Experimental results based on the Brugge field case data verify the efficiency of the proposed approach.


2010 ◽  
Vol 73 (10-12) ◽  
pp. 2186-2195 ◽  
Author(s):  
Xiumei Wang ◽  
Xinbo Gao ◽  
Yuan Yuan ◽  
Dacheng Tao ◽  
Jie Li

2020 ◽  
Author(s):  
Archit Verma ◽  
Barbara Engelhardt

Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the data onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, induced pluripotent stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq.Code and data is available at https://github.com/architverma1/sc-manifold-alignment


2020 ◽  
Author(s):  
Aditya Arie Nugraha ◽  
Kouhei Sekiguchi ◽  
Kazuyoshi Yoshii

This paper describes a deep latent variable model of speech power spectrograms and its application to semi-supervised speech enhancement with a deep speech prior. By integrating two major deep generative models, a variational autoencoder (VAE) and a normalizing flow (NF), in a mutually-beneficial manner, we formulate a flexible latent variable model called the NF-VAE that can extract low-dimensional latent representations from high-dimensional observations, akin to the VAE, and does not need to explicitly represent the distribution of the observations, akin to the NF. In this paper, we consider a variant of NF called the generative flow (GF a.k.a. Glow) and formulate a latent variable model called the GF-VAE. We experimentally show that the proposed GF-VAE is better than the standard VAE at capturing fine-structured harmonics of speech spectrograms, especially in the high-frequency range. A similar finding is also obtained when the GF-VAE and the VAE are used to generate speech spectrograms from latent variables randomly sampled from the standard Gaussian distribution. Lastly, when these models are used as speech priors for statistical multichannel speech enhancement, the GF-VAE outperforms the VAE and the GF.


Sign in / Sign up

Export Citation Format

Share Document