scholarly journals SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

2019 ◽  
Author(s):  
Trung Ngo Trong ◽  
Roger Kramer ◽  
Juha Mehtonen ◽  
Gerardo González ◽  
Ville Hautamäki ◽  
...  

ABSTRACTSingle-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture.

2016 ◽  
Author(s):  
Caleb Weinreb ◽  
Samuel Wolock ◽  
Allon Klein

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]


2016 ◽  
Author(s):  
Gregory Giecold ◽  
Eugenio Marco ◽  
Lorenzo Trippa ◽  
Guo-Cheng Yuan

Single-cell gene expression data provide invaluable resources for systematic characterization of cellular hierarchy in multi-cellular organisms. However, cell lineage reconstruction is still often associated with significant uncertainty due to technological constraints. Such uncertainties have not been taken into account in current methods. We present ECLAIR, a novel computational method for the statistical inference of cell lineage relationships from single-cell gene expression data. ECLAIR uses an ensemble approach to improve the robustness of lineage predictions, and provides a quantitative estimate of the uncertainty of lineage branchings. We show that the application of ECLAIR to published datasets successfully reconstructs known lineage relationships and significantly improves the robustness of predictions. In conclusion, ECLAIR is a powerful bioinformatics tool for single-cell data analysis. It can be used for robust lineage reconstruction with quantitative estimate of prediction accuracy.


2018 ◽  
Vol 19 (3) ◽  
pp. 291-301 ◽  
Author(s):  
David Zemmour ◽  
Rapolas Zilionis ◽  
Evgeny Kiner ◽  
Allon M. Klein ◽  
Diane Mathis ◽  
...  

2014 ◽  
Vol 42 (15) ◽  
pp. 9880-9891 ◽  
Author(s):  
Arne H. Smits ◽  
Rik G.H. Lindeboom ◽  
Matteo Perino ◽  
Simon J. van Heeringen ◽  
Gert Jan C. Veenstra ◽  
...  

Abstract While recent developments in genomic sequencing technology have enabled comprehensive transcriptome analyses of single cells, single cell proteomics has thus far been restricted to targeted studies. Here, we perform global absolute protein quantification of fertilized Xenopus laevis eggs using mass spectrometry-based proteomics, quantifying over 5800 proteins in the largest single cell proteome characterized to date. Absolute protein amounts in single eggs are highly consistent, thus indicating a tight regulation of global protein abundance. Protein copy numbers in single eggs range from tens of thousands to ten trillion copies per cell. Comparison between the single-cell proteome and transcriptome reveal poor expression correlation. Finally, we identify 439 proteins that significantly change in abundance during early embryogenesis. Downregulated proteins include ribosomal proteins and upregulated proteins include basal transcription factors, among others. Many of these proteins do not show regulation at the transcript level. Altogether, our data reveal that the transcriptome is a poor indicator of the proteome and that protein levels are tightly controlled in X. laevis eggs.


2018 ◽  
Vol 19 (6) ◽  
pp. 645-645 ◽  
Author(s):  
David Zemmour ◽  
Rapolas Zilionis ◽  
Evgeny Kiner ◽  
Allon M Klein ◽  
Diane Mathis ◽  
...  

2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Ana Rita Silva Moreira ◽  
Alexandra N Lagasse ◽  
Anessa C Haney ◽  
Nathan Avaritt ◽  
Stephanie Byrum ◽  
...  

Abstract Sufficient nutrition is critical for reproduction. We have previously shown that leptin, a circulating indicator of fat stores, signals to pituitary gonadotropes to maintain gonadotropin releasing hormone receptor (GnRHR) protein levels in female mice. We hypothesized that this process is post-transcriptional, happening primarily through regulation of the RNA-binding protein Musashi (MSI). We showed that MSI binds to Gnrhr and inhibits translation, and a gonadotrope-specific deletion of Msi1 and Msi2 (Gon-Msi1/2-null) leads to increased GnRHR protein levels. This culminates in dysregulated luteinizing hormone (LH) and follicle-stimulating hormone (FSH). We have recently identified other gonadotrope and pituitary targets of MSI. We therefore suspected that MSI plays a role in both the maturation of gonadotropes and the normal cyclic regulation of gonadotropes. We hypothesized that the deletion of MSI would lead to downstream effects on (1) the composition of the gonadotrope population and (2) the molecular landscape of these cells. Using our adult, diestrous Gon-Msi1/2-null females, we performed single-cell RNA-sequencing on methanol-fixed dispersed pituitary cells. Libraries were made from two control pools and two mutant pools (n=3 pituitaries/pool) using 10x Genomics v3.1 Single-Cell Gene Expression technology and initially sequenced on an Illumina Next-seq mid-output flow-cell, yielding 5,000 reads/cell. Subsequent high-output sequencing obtained 25,000 reads/cell. We recovered single-cell mRNA transcript information from 18,206 control pituitary cells and 16,255 Gon-Msi1/2-null cells. Our analyses revealed that the Gon-Msi1/2-null pools had a higher % of cells expressing Fshb, as well as an expected significant drop in Msi2-expressing gonadotropes and no change in Lhb-expressing cells. We have recently identified Fshb as an MSI target in silico, and qRT-PCR of female pituitary lysate immunoprecipitated with anti-MSI1 shows a 7-fold enrichment in Fshb mRNA. We identified differentially expressed genes comparing the control and Gon-Msi1/2-null gonadotrope clusters. Using Gene Ontology analyses, the Gon-Msi1/2-null gonadotrope cluster appears to have aberrant expression of mRNAs involved in protein folding and cellular responses to nutrients. Our high-output sequencing has allowed us to achieve 25,000 reads/cell and will provide greater resolution of the role of Musashi in control of gonadotrope function. Taken together, our data indicate that Musashi influences the molecular landscape and subsequent physiology of the female gonadotrope. We have identified potential gonadotrope-specific MSI targets, including pathways that may underlie the dysregulated gonadotropin production and secretion seen in our Gon-Msi1/2-null females. Future studies will compare pubertal and adult females, as well as females from different estrous cycle stages.


2021 ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractSingle-cell measurements of different cellular features or modalities from cells from the same system allow for a comprehensive understanding of a biological process. While the most common single-cell sequencing technologies require separate input cells for different modalities, there are a growing number of platforms that allow for measuring several modalities on a single cell. We present a novel method, Cobolt, for analyzing such multi-modality single-cell sequencing datasets. Cobolt jointly models the multiple modalities via a novel application of Multimodal Variational Autoencoder (MVAE) to a hierarchical generative model. We first demonstrate its performance on data from the multi-modality platform SNARE-seq, consisting of measurements of gene expression and chromatin accessibility on the same cells. We then illustrate the ability of Cobolt to integrate multi-modality platforms with single-modality platforms by jointly analyzing a SNARE-seq dataset, a single-cell gene expression dataset, and a single-cell chromatin accessibility dataset. We compared Cobolt with current options for analyzing such datasets and show that Cobolt provides robust and flexible results for integration of single-cell data on multiple modalities.


2016 ◽  
Author(s):  
Thalia E. Chan ◽  
Michael P.H. Stumpf ◽  
Ann C. Babtie

AbstractWhile single-cell gene expression experiments present new challenges for data processing, the cell-to-cell variability observed also reveals statistical relationships that can be used by information theory. Here, we use multivariate information theory to explore the statistical dependencies between triplets of genes in single-cell gene expression datasets. We develop PIDC, a fast, efficient algorithm that uses partial information decomposition (PID) to identify regulatory relationships between genes. We thoroughly evaluate the performance of our algorithm and demonstrate that the higher order information captured by PIDC allows it to outperform pairwise mutual information-based algorithms when recovering true relationships present in simulated data. We also infer gene regulatory networks from three experimental single-cell data sets and illustrate how network context, choices made during analysis, and sources of variability affect network inference. PIDC tutorials and open-source software for estimating PID are available here:https://github.com/Tchanders/network_inference_tutorials. PIDC should facilitate the identification of putative functional relationships and mechanistic hypotheses from single-cell transcriptomic data.


2017 ◽  
Author(s):  
Tao Peng ◽  
Qing Nie

AbstractMeasurement of gene expression levels for multiple genes in single cells provides a powerful approach to study heterogeneity of cell populations and cellular plasticity. While the expression levels of multiple genes in each cell are available in such data, the potential connections among the cells (e.g. the cellular state transition relationship) are not directly evident from the measurement. Classifying the cellular states, identifying their transitions among those states, and extracting the pseudotime ordering of cells are challenging due to the noise in the data and the high-dimensionality in the number of genes in the data. In this paper we adapt the classical self-organizing-map (SOM) approach for single-cell gene expression data (SOMSC), such as those based on single cell qPCR and single cell RNA-seq. In SOMSC, a cellular state map (CSM) is derived and employed to identify cellular states inherited in the population of the measured single cells. Cells located in the same basin of the CSM are considered as in one cellular state while barriers among the basins in CSM provide information on transitions among the cellular states. A cellular state transitions path (e.g. differentiation) and a temporal ordering of the measured single cells are consequently obtained. In addition, SOMSC could estimate the cellular state replication probability and transition probabilities. Applied to a set of synthetic data, one single-cell qPCR data set on mouse early embryonic development and two single-cell RNA-seq data sets, SOMSC shows effectiveness in capturing cellular states and their transitions presented in the high-dimensional single-cell data. This approach will have broader applications to analyzing cellular fate specification and cell lineages using single cell gene expression data


Sign in / Sign up

Export Citation Format

Share Document