scholarly journals Recovery and analysis of transcriptome subsets from pooled single-cell RNA-seq libraries

2018 ◽  
Author(s):  
Kent A. Riemondy ◽  
Monica Ransom ◽  
Christopher Alderman ◽  
Austin E. Gillen ◽  
Rui Fu ◽  
...  

ABSTRACTSingle-cell RNA sequencing (scRNA-seq) methods generate sparse gene expression profiles for thousands of single cells in a single experiment. The information in these profiles is sufficient to classify cell types by distinct expression patterns but the high complexity of scRNA-seq libraries often prevents full characterization of transcriptomes from individual cells. To extract more focused gene expression information from scRNA-seq libraries, we developed a strategy to physically recover the DNA molecules comprising transcriptome subsets, enabling deeper interrogation of the isolated molecules by another round of DNA sequencing. We applied the method in cell-centric and gene-centric modes to isolate cDNA fragments from scRNA-seq libraries. First, we resampled the transcriptomes of rare, single megakaryocytes from a complex mixture of lymphocytes and analyzed them in a second round of DNA sequencing, yielding up to 20-fold greater sequencing depth per cell and increasing the number of genes detected per cell from a median of 1,313 to 2,002. We similarly isolated mRNAs from targeted T cells to improve the reconstruction of their VDJ-rearranged immune receptor mRNAs. Second, we isolatedCD3DmRNA fragments expressed across cells in a scRNA-seq library prepared from a clonal T cell line, increasing the number of cells with detectedCD3Dexpression from 59.7% to 100%. Transcriptome resampling is a general approach to recover targeted gene expression information from single-cell RNA sequencing libraries that enhances the utility of these costly experiments, and may be applicable to the targeted recovery of molecules from other single-cell assays.

2020 ◽  
Vol 36 (13) ◽  
pp. 4021-4029
Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Moonyoung Kang ◽  
Yuri Choi ◽  
Hyeonjin Kim ◽  
Sang-Gyu Kim

High-throughput single-cell RNA sequencing (scRNA-seq) identifies distinct cell populations based on cell-to-cell heterogeneity in gene expression. By examining the distribution of the density of gene expression profiles, the metabolic features of each cell population can be observed. Here, we employ the scRNA-seq technique to reveal the entire biosynthetic pathway of a flower volatile. The corolla (petals) of the wild tobacco Nicotiana attenuata emits a bouquet of scents that are composed mainly of benzylacetone (BA), a rare floral volatile. Protoplasts from the N. attenuata corolla were isolated at three different time points, and the transcript levels of >16,000 genes were analyzed in 3,756 single cells. We performed unsupervised clustering analysis to determine which cell clusters were involved in BA biosynthesis. The biosynthetic pathway of BA was uncovered by analyzing gene co-expression in scRNA-seq datasets and by silencing candidate genes in the corolla. In conclusion, the high-resolution spatiotemporal atlas of gene expression provided by scRNA-seq reveals the molecular features underlying cell-type-specific metabolism in a plant.


iScience ◽  
2021 ◽  
Vol 24 (4) ◽  
pp. 102357
Author(s):  
Brenda Morsey ◽  
Meng Niu ◽  
Shetty Ravi Dyavar ◽  
Courtney V. Fletcher ◽  
Benjamin G. Lamberty ◽  
...  

Science ◽  
2020 ◽  
Vol 371 (6531) ◽  
pp. eaba5257 ◽  
Author(s):  
Anna Kuchina ◽  
Leandra M. Brettner ◽  
Luana Paleologu ◽  
Charles M. Roco ◽  
Alexander B. Rosenberg ◽  
...  

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.


2021 ◽  
Author(s):  
Chaohao Gu ◽  
Zhandong Liu

Abstract Spatial gene-expression is a crucial determinant of cell fate and behavior. Recent imaging and sequencing-technology advancements have enabled scientists to develop new tools that use spatial information to measure gene-expression at close to single-cell levels. Yet, while Fluorescence In-situ Hybridization (FISH) can quantify transcript numbers at single-cell resolution, it is limited to a small number of genes. Similarly, slide-seq was designed to measure spatial-expression profiles at the single-cell level but has a relatively low gene-capture rate. And although single-cell RNA-seq enables deep cellular gene-expression profiling, it loses spatial information during sample-collection. These major limitations have stymied these methods’ broader application in the field. To overcome spatio-omics technology’s limitations and better understand spatial patterns at single-cell resolution, we designed a computation algorithm that uses glmSMA to predict cell locations by integrating scRNA-seq data with a spatial-omics reference atlas. We treated cell-mapping as a convex optimization problem by minimizing the differences between cellular-expression profiles and location-expression profiles with an L1 regularization and graph Laplacian based L2 regularization to ensure a sparse and smooth mapping. We validated the mapping results by reconstructing spatial- expression patterns of well-known marker genes in complex tissues, like the mouse cerebellum and hippocampus. We used the biological literature to verify that the reconstructed patterns can recapitulate cell-type and anatomy structures. Our work thus far shows that, together, we can use glmSMA to accurately assign single cells to their original reference-atlas locations.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 2756-2756
Author(s):  
Erin Guest ◽  
Byunggil Yoo ◽  
Rumen Kostadinov ◽  
Midhat S. Farooqi ◽  
Emily Farrow ◽  
...  

Introduction Infant acute lymphoblastic leukemia (ALL) with KMT2A rearrangement (KMT2A-r) is associated with a very poor prognosis. Disease free survival from the date of diagnosis is approximately 20% to 40%, depending on age, white blood cell count, and response to induction therapy. Refractory and relapsed infant ALL is often resistant to attempts at re-induction, and second remission is difficult to both achieve and maintain. Genomic sequencing studies of infant KMT2A-r ALL clinical samples have demonstrated an average of fewer than 3 additional non-silent somatic mutations per case at diagnosis, most commonly sub-clonal variants in RAS pathway genes. We previously reported relapse-associated gains in somatic variants associated with signaling, adhesion, and B-cell development pathways (Blood 2016 128:1735). We hypothesized that relapsed infant ALL is characterized by recurrent, altered patterns of gene expression. In this analysis, we utilized single cell RNA sequencing (scRNAseq) to identify candidate genes with differential expression in diagnostic vs. relapse leukemia specimens from 3 infants with KMT2A-r ALL. Methods Cryopreserved blood or bone marrow specimens from 3 infants enrolled in the Children's Oncology Group AALL0631 trial were selected for analysis. Samples from both diagnosis (DX) and relapse (RL) time points were thawed and checked for viability (>90% of cells viable) using trypan blue staining. Samples were multiplexed and processed for single cell RNA sequencing using the Chromium Single Cell 3' Library Kit (v2) and 10x Genomics Chromium controller per manufacturer's instructions (10x Genomics, Pleasanton, CA). Single cell libraries were converted to cDNA, amplified, and sequenced on an Illumina NovaSeq instrument. Two technical replicates were performed. Samples were de-multiplexed using genotype information acquired from previous whole exome sequencing (WES) and demuxlet software. Transcript alignment and counting were performed using the Cell Ranger pipeline (10x Genomics, default settings, Version 2.2.0, GRCh37 reference). Quality control, normalization, gene expression analysis, and unsupervised clustering were performed using the Seurat R package (Version 3.0). Dimensionality reduction and visualization were performed with the UMAP algorithm. Analyses were restricted to leukemia blasts with CD19 expression by scRNAseq. Results The clinical features for each case are shown in Table 1. Cells from the 3 infant ALL samples clustered together, distinct from cells of non-infant B-ALL, T-ALL, and mixed lineage acute leukemia biospecimens in the Children's Mercy scRNAseq database, but largely did not overlap with one another. For each of the 3 infant cases, cells from DX and RL time points could be distinguished by differential patterns of gene expression (Figure 1). Individual genes with statistically significant (p<0.05) log-fold change values were examined. Figure 2 summarizes the number of genes with up-regulation of expression by scRNAseq at RL compared to DX. Only 6 genes, DYNLL1, HMGB2, HMGN2, JUN, STMN1, and TUBA1B, were significantly increased at RL across all 3 cases. We repeated this analysis, restricting to leukemia blasts with CD79A expression, and identified these same 6 genes, and 4 additional genes: H2AFZ, NUCKS1, PRDX1, and TUBB, as consistently up-regulated in RL clusters. We examined the expression of candidate genes of interest, including clinically targetable genes, to compare the distribution of expression at DX and RL (Table 2). Conclusion Genomic factors underlying the aggressive, refractory clinical phenotype of relapsed infant ALL have yet to be defined. Each of these 3 cases demonstrates unique expression patterns at relapse, readily distinguishable from both the paired diagnostic sample and the other 2 relapse samples. Thus, scRNAseq is a powerful tool to identify heterogeneity in gene expression, with the potential to discover recurrent genomic drivers within resistant disease sub-clones. Ongoing analyses include scRNAseq in additional infant ALL samples, relative quantification of transcript expression in single cells, and comparison with bulk RNAseq data. Disclosures No relevant conflicts of interest to declare.


2019 ◽  
Vol 28 (21) ◽  
pp. 3569-3583 ◽  
Author(s):  
Patricia M Schnepp ◽  
Mengjie Chen ◽  
Evan T Keller ◽  
Xiang Zhou

Abstract Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type-specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single-cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analyzed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doors for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.


Author(s):  
Meichen Dong ◽  
Aatish Thennavan ◽  
Eugene Urrutia ◽  
Yun Li ◽  
Charles M Perou ◽  
...  

Abstract Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


Sign in / Sign up

Export Citation Format

Share Document