scholarly journals SCRIBE: a new approach to dropout imputation and batch effects correction for single-cell RNA-seq data

2019 ◽  
Author(s):  
Yiliang Zhang ◽  
Kexuan Liang ◽  
Molei Liu ◽  
Yue Li ◽  
Hao Ge ◽  
...  

AbstractSingle-cell RNA sequencing technologies are widely used in recent years as a powerful tool allowing the observation of gene expression at the resolution of single cells. Two of the major challenges in scRNA-seq data analysis are dropout events and batch effects. The inflation of zero(dropout rate) varies substantially across single cells. Evidence has shown that technical noise, including batch effects, explains a notable proportion of this cell-to-cell variation. To capture biological variation, it is necessary to quantify and remove technical variation. Here, we introduce SCRIBE (Single-Cell Recovery Imputation with Batch Effects), a principled framework that imputes dropout events and corrects batch effects simultaneously. We demonstrate, through real examples, that SCRIBE outperforms existing scRNA-seq data analysis tools in recovering cell-specific gene expression patterns, removing batch effects and retaining biological variation across cells. Our software is freely available online at https://github.com/YiliangTracyZhang/SCRIBE.

2018 ◽  
Author(s):  
Krishan Gupta ◽  
Manan Lalit ◽  
Aditya Biswas ◽  
Ujjwal Maulik ◽  
Sanghamitra Bandyopadhyay ◽  
...  

1AbstractSystematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single cell expression data. The zero-inflated version of Poisson/Negative Binomial and Log-Normal distributions have emerged as the most popular alternatives due to their ability to accommodate high dropout rates, as commonly observed in single cell data. While the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression-ranks, as robust surrogates for transcript abundance. Here we examined the performance of the Discrete Generalized Beta Distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method, to understand its advantages as compared to some of the existing best practice approaches. Besides striking a reasonable balance between Type 1 and Type 2 errors, we concluded that ROSeq, the proposed differential expression test is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq, and made it available on the Bioconductor platform.


Author(s):  
Kenneth H. Hu ◽  
John P. Eichorst ◽  
Chris S. McGinnis ◽  
David M. Patterson ◽  
Eric D. Chow ◽  
...  

ABSTRACTSpatial transcriptomics seeks to integrate single-cell transcriptomic data within the 3-dimensional space of multicellular biology. Current methods use glass substrates pre-seeded with matrices of barcodes or fluorescence hybridization of a limited number of probes. We developed an alternative approach, called ‘ZipSeq’, that uses patterned illumination and photocaged oligonucleotides to serially print barcodes (Zipcodes) onto live cells within intact tissues, in real-time and with on-the-fly selection of patterns. Using ZipSeq, we mapped gene expression in three settings: in-vitro wound healing, live lymph node sections and in a live tumor microenvironment (TME). In all cases, we discovered new gene expression patterns associated with histological structures. In the TME, this demonstrated a trajectory of myeloid and T cell differentiation, from periphery inward. A variation of ZipSeq efficiently scales to the level of single cells, providing a pathway for complete mapping of live tissues, subsequent to real-time imaging or perturbation.


2019 ◽  
Author(s):  
Sabina Kanton ◽  
Michael James Boyle ◽  
Zhisong He ◽  
Malgorzata Santel ◽  
Anne Weigert ◽  
...  

ABSTRACTThe human brain has changed dramatically since humans diverged from our closest living relatives, chimpanzees and the other great apes1–5. However, the genetic and developmental programs underlying this divergence are not fully understood6–8. Here, we have analyzed stem cell-derived cerebral organoids using single-cell transcriptomics (scRNA-seq) and accessible chromatin profiling (scATAC-seq) to explore gene regulatory changes that are specific to humans. We first analyze cell composition and reconstruct differentiation trajectories over the entire course of human cerebral organoid development from pluripotency, through neuroectoderm and neuroepithelial stages, followed by divergence into neuronal fates within the dorsal and ventral forebrain, midbrain and hindbrain regions. We find that brain region composition varies in organoids from different iPSC lines, yet regional gene expression patterns are largely reproducible across individuals. We then analyze chimpanzee and macaque cerebral organoids and find that human neuronal development proceeds at a delayed pace relative to the other two primates. Through pseudotemporal alignment of differentiation paths, we identify human-specific gene expression resolved to distinct cell states along progenitor to neuron lineages in the cortex. We find that chromatin accessibility is dynamic during cortex development, and identify instances of accessibility divergence between human and chimpanzee that correlate with human-specific gene expression and genetic change. Finally, we map human-specific expression in adult prefrontal cortex using single-nucleus RNA-seq and find developmental differences that persist into adulthood, as well as cell state-specific changes that occur exclusively in the adult brain. Our data provide a temporal cell atlas of great ape forebrain development, and illuminate dynamic gene regulatory features that are unique to humans.


2021 ◽  
Author(s):  
Chaohao Gu ◽  
Zhandong Liu

Abstract Spatial gene-expression is a crucial determinant of cell fate and behavior. Recent imaging and sequencing-technology advancements have enabled scientists to develop new tools that use spatial information to measure gene-expression at close to single-cell levels. Yet, while Fluorescence In-situ Hybridization (FISH) can quantify transcript numbers at single-cell resolution, it is limited to a small number of genes. Similarly, slide-seq was designed to measure spatial-expression profiles at the single-cell level but has a relatively low gene-capture rate. And although single-cell RNA-seq enables deep cellular gene-expression profiling, it loses spatial information during sample-collection. These major limitations have stymied these methods’ broader application in the field. To overcome spatio-omics technology’s limitations and better understand spatial patterns at single-cell resolution, we designed a computation algorithm that uses glmSMA to predict cell locations by integrating scRNA-seq data with a spatial-omics reference atlas. We treated cell-mapping as a convex optimization problem by minimizing the differences between cellular-expression profiles and location-expression profiles with an L1 regularization and graph Laplacian based L2 regularization to ensure a sparse and smooth mapping. We validated the mapping results by reconstructing spatial- expression patterns of well-known marker genes in complex tissues, like the mouse cerebellum and hippocampus. We used the biological literature to verify that the reconstructed patterns can recapitulate cell-type and anatomy structures. Our work thus far shows that, together, we can use glmSMA to accurately assign single cells to their original reference-atlas locations.


2019 ◽  
Author(s):  
Jerome Samir ◽  
Simone Rizzetto ◽  
Money Gupta ◽  
Fabio Luciani

Abstract Background Single cell RNA sequencing provides unprecedented opportunity to simultaneously explore the transcriptomic and immune receptor diversity of T and B cells. However, there are limited tools available that simultaneously analyse large multi-omics datasets integrated with metadata such as patient and clinical information.Results We developed VDJView, which permits the simultaneous or independent analysis and visualisation of gene expression, immune receptors, and clinical metadata of both T and B cells. This tool is implemented as an easy-to-use R shiny web-application, which integrates numerous gene expression and TCR analysis tools, and accepts data from plate-based sorted or high-throughput single cell platforms. We utilised VDJView to analyse several 10X scRNA-seq datasets, including a recent dataset of 150,000 CD8+ T cells with available gene expression, TCR sequences, quantification of 15 surface proteins, and 44 antigen specificities (across viruses, cancer, and self-antigens). We performed quality control, filtering of tetramer non-specific cells, clustering, random sampling and hypothesis testing to discover antigen specific gene signatures which were associated with immune cell differentiation states and clonal expansion across the pathogen specific T cells. We also analysed 563 single cells (plate-based sorted) obtained from 11 subjects, revealing clonally expanded T and B cells across primary cancer tissues and metastatic lymph-node. These immune cells clustered with distinct gene signatures according to the breast cancer molecular subtype. VDJView has been tested in lab meetings and peer-to-peer discussions, showing effective data generation and discussion without the need to consult bioinformaticians.Conclusions VDJView enables researchers without profound bioinformatics skills to analyse immune scRNA-seq data, integrating and visualising this with clonality and metadata profiles, thus accelerating the process of hypothesis testing, data interpretation and discovery of cellular heterogeneity. VDJView is freely available at https://bitbucket.org/kirbyvisp/vdjview .


2020 ◽  
Author(s):  
Jerome Samir ◽  
Simone Rizzetto ◽  
Money Gupta ◽  
Fabio Luciani

Abstract Background Single cell RNA sequencing provides unprecedented opportunity to simultaneously explore the transcriptomic and immune receptor diversity of T and B cells. However, there are limited tools available that simultaneously analyse large multi-omics datasets integrated with metadata such as patient and clinical information.Results We developed VDJView, which permits the simultaneous or independent analysis and visualisation of gene expression, immune receptors, and clinical metadata of both T and B cells. This tool is implemented as an easy-to-use R shiny web-application, which integrates numerous gene expression and TCR analysis tools, and accepts data from plate-based sorted or high-throughput single cell platforms. We utilised VDJView to analyse several 10X scRNA-seq datasets, including a recent dataset of 150,000 CD8+ T cells with available gene expression, TCR sequences, quantification of 15 surface proteins, and 44 antigen specificities (across viruses, cancer, and self-antigens). We performed quality control, filtering of tetramer non-specific cells, clustering, random sampling and hypothesis testing to discover antigen specific gene signatures which were associated with immune cell differentiation states and clonal expansion across the pathogen specific T cells. We also analysed 563 single cells (plate-based sorted) obtained from 11 subjects, revealing clonally expanded T and B cells across primary cancer tissues and metastatic lymph-node. These immune cells clustered with distinct gene signatures according to the breast cancer molecular subtype. VDJView has been tested in lab meetings and peer-to-peer discussions, showing effective data generation and discussion without the need to consult bioinformaticians.Conclusions VDJView enables researchers without profound bioinformatics skills to analyse immune scRNA-seq data, integrating and visualising this with clonality and metadata profiles, thus accelerating the process of hypothesis testing, data interpretation and discovery of cellular heterogeneity. VDJView is freely available at https://bitbucket.org/kirbyvisp/vdjview .


2021 ◽  
Author(s):  
Fang Ye ◽  
Guodong Zhang ◽  
Weigao E ◽  
Haide Chen ◽  
Chengxuan Yu ◽  
...  

Abstract The Mexican axolotl (Ambystoma mexicanum) is a promising tetrapod model for regeneration and developmental studies. Remarkably, neotenic axolotls may undergo metamorphosis, during which their regeneration capacity and lifespan gradually decline. However, a system-level single-cell analysis of molecular characteristics in neotenic and metamorphosed axolotls is still lacking. Here, we developed a single-cell RNA-seq method based on combinatorial hybridization to generate a tissue-based transcriptomic atlas of the adult axolotl. We performed gene expression profiling of over 1 million single cells across 19 tissues to construct the first adult axolotl cell atlas. Comparison of single-cell transcriptomes between the tissues of neotenic and metamorphosed axolotls revealed the heterogeneity of structural cells in different tissues and established their regulatory network. Furthermore, we described dynamic gene expression patterns during limb development in neotenic axolotls. These data serve as a resource to explore the molecular identity of the axolotl as well as its metamorphosis.


2021 ◽  
Author(s):  
Zi-Hang Wen ◽  
Jeremy L. Langsam ◽  
Lu Zhang ◽  
Wenjun Shen ◽  
Xin Zhou

AbstractSingle-cell RNA-seq (scRNA-seq) offers opportunities to study gene expression of tens of thousands of single cells simultaneously, to investigate cell-to-cell variation, and to reconstruct cell-type-specific gene regulatory networks. Recovering dropout events in a sparse gene expression matrix for scRNA-seq data is a long-standing matrix completion problem. We introduce Bfimpute, a Bayesian factorization imputation algorithm that reconstructs two latent gene and cell matrices to impute final gene expression matrix within each cell group, with or without the aid of cell type labels or bulk data. Bfimpute achieves better accuracy than other six publicly notable scRNA-seq imputation methods on simulated and real scRNA-seq data, as measured by several different evaluation metrics. Bfimpute can also flexibly integrate any gene or cell related information that users provide to increase the performance. Availability: Bfimpute is implemented in R and is freely available at https://github.com/maiziezhoulab/Bfimpute.


2020 ◽  
Author(s):  
Mojtaba Bahrami ◽  
Malosree Maitra ◽  
Corina Nagy ◽  
Gustavo Turecki ◽  
Hamid R. Rabiee ◽  
...  

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) has opened the opportunities to dissect the heterogeneous cellular composition and interrogate the cell-type-specific gene expression patterns across diverse conditions. However, batch effects such as laboratory conditions and individual-variability hinder their usage in cross-condition design.ResultsWe present single-cell Generative Adversarial Network (scGAN). Our main contribution is to introduce an adversarial network to predict batch effects using the embeddings from the variational autoencoder network, which does not only need to maximize the Negative Binomial data likelihood of the raw scRNA-seq counts but also minimize the correlation between the latent embeddings and the batch effects. We demonstrate scGAN on three public scRNA-seq datasets and show that our method confers superior performance over the state-of-the-art methods in forming clusters of known cell types and identifying known psychiatric genes that are associated with major depressive disorder.AvailabilityThe code is available at https://github.com/li-lab-mcgill/[email protected]


2018 ◽  
Author(s):  
Sarthak Sharma ◽  
Wei Wang ◽  
Alberto Stolfi

AbstractThe tadpole-type larva of Ciona has emerged as an intriguing model system for the study of neurodevelopment. The Ciona intestinalis connectome has been recently mapped, revealing the smallest central nervous system (CNS) known in any chordate, with only 177 neurons. This minimal CNS is highly reminiscent of larger CNS of vertebrates, sharing many conserved developmental processes, anatomical compartments, neuron subtypes, and even specific neural circuits. Thus, the Ciona tadpole offers a unique opportunity to understand the development and wiring of a chordate CNS at single-cell resolution. Here we report the use of single-cell RNAseq to profile the transcriptomes of single cells isolated by fluorescence-activated cell sorting (FACS) from the whole brain of Ciona robusta (formerly intestinalis Type A) larvae. We have also compared these profiles to bulk RNAseq data from specific subsets of brain cells isolated by FACS using cell type-specific reporter plasmid expression. Taken together, these datasets have begun to reveal the compartment- and cell-specific gene expression patterns that define the organization of the Ciona larval brain.


Sign in / Sign up

Export Citation Format

Share Document