scholarly journals PRIME: a probabilistic imputation method to reduce dropout effects in single-cell RNA sequencing

2020 ◽  
Vol 36 (13) ◽  
pp. 4021-4029
Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

AbstractSingle-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data therefore need to be carefully processed before in-depth analysis. Here we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local community of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single cell sequencing), on six datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise.


2018 ◽  
Author(s):  
Kent A. Riemondy ◽  
Monica Ransom ◽  
Christopher Alderman ◽  
Austin E. Gillen ◽  
Rui Fu ◽  
...  

ABSTRACTSingle-cell RNA sequencing (scRNA-seq) methods generate sparse gene expression profiles for thousands of single cells in a single experiment. The information in these profiles is sufficient to classify cell types by distinct expression patterns but the high complexity of scRNA-seq libraries often prevents full characterization of transcriptomes from individual cells. To extract more focused gene expression information from scRNA-seq libraries, we developed a strategy to physically recover the DNA molecules comprising transcriptome subsets, enabling deeper interrogation of the isolated molecules by another round of DNA sequencing. We applied the method in cell-centric and gene-centric modes to isolate cDNA fragments from scRNA-seq libraries. First, we resampled the transcriptomes of rare, single megakaryocytes from a complex mixture of lymphocytes and analyzed them in a second round of DNA sequencing, yielding up to 20-fold greater sequencing depth per cell and increasing the number of genes detected per cell from a median of 1,313 to 2,002. We similarly isolated mRNAs from targeted T cells to improve the reconstruction of their VDJ-rearranged immune receptor mRNAs. Second, we isolatedCD3DmRNA fragments expressed across cells in a scRNA-seq library prepared from a clonal T cell line, increasing the number of cells with detectedCD3Dexpression from 59.7% to 100%. Transcriptome resampling is a general approach to recover targeted gene expression information from single-cell RNA sequencing libraries that enhances the utility of these costly experiments, and may be applicable to the targeted recovery of molecules from other single-cell assays.


2020 ◽  
Vol 36 (10) ◽  
pp. 3131-3138
Author(s):  
Ke Jin ◽  
Le Ou-Yang ◽  
Xing-Ming Zhao ◽  
Hong Yan ◽  
Xiao-Fei Zhang

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) methods make it possible to reveal gene expression patterns at single-cell resolution. Due to technical defects, dropout events in scRNA-seq will add noise to the gene-cell expression matrix and hinder downstream analysis. Therefore, it is important for recovering the true gene expression levels before carrying out downstream analysis. Results In this article, we develop an imputation method, called scTSSR, to recover gene expression for scRNA-seq. Unlike most existing methods that impute dropout events by borrowing information across only genes or cells, scTSSR simultaneously leverages information from both similar genes and similar cells using a two-side sparse self-representation model. We demonstrate that scTSSR can effectively capture the Gini coefficients of genes and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization (smRNA FISH). Down-sampling experiments indicate that scTSSR performs better than existing methods in recovering the true gene expression levels. We also show that scTSSR has a competitive performance in differential expression analysis, cell clustering and cell trajectory inference. Availability and implementation The R package is available at https://github.com/Zhangxf-ccnu/scTSSR. Supplementary information Supplementary data are available at Bioinformatics online.


iScience ◽  
2021 ◽  
Vol 24 (4) ◽  
pp. 102357
Author(s):  
Brenda Morsey ◽  
Meng Niu ◽  
Shetty Ravi Dyavar ◽  
Courtney V. Fletcher ◽  
Benjamin G. Lamberty ◽  
...  

Science ◽  
2020 ◽  
Vol 371 (6531) ◽  
pp. eaba5257 ◽  
Author(s):  
Anna Kuchina ◽  
Leandra M. Brettner ◽  
Luana Paleologu ◽  
Charles M. Roco ◽  
Alexander B. Rosenberg ◽  
...  

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


Author(s):  
Meichen Dong ◽  
Aatish Thennavan ◽  
Eugene Urrutia ◽  
Yun Li ◽  
Charles M Perou ◽  
...  

Abstract Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


2020 ◽  
Author(s):  
Min Feng ◽  
Junming Xia ◽  
Shigang Fei ◽  
Xiong Wang ◽  
Yaohong Zhou ◽  
...  

AbstractA wide range of hemocyte types exist in insects but a full definition of the different subclasses is not yet established. The current knowledge of the classification of silkworm hemocytes mainly comes from morphology rather than specific markers, so our understanding of the detailed classification, hemocyte lineage and functions of silkworm hemocytes is very incomplete. Bombyx mori nucleopolyhedrovirus (BmNPV) is a representative member of the baculoviruses, which are a major pathogens that specifically infects silkworms and cause serious loss in sericulture industry. Here, we performed single-cell RNA sequencing (scRNA-seq) of silkworm hemocytes in BmNPV and mock-infected larvae to comprehensively identify silkworm hemocyte subsets and determined specific molecular and cellular characteristics in each hemocyte subset before and after viral infection. A total of 19 cell clusters and their potential marker genes were identified in silkworm hemocytes. Among these hemocyte clusters, clusters 0, 1, 2, 5 and 9 might be granulocytes (GR); clusters 14 and 17 were predicted as plasmatocytes (PL); cluster 18 was tentatively identified as spherulocytes (SP); and clusters 7 and 11 could possibly correspond to oenocytoids (OE). In addition, all of the hemocyte clusters were infected by BmNPV and some infected cells carried high viral-load in silkworm larvae at 3 day post infection (dpi). Interestingly, BmNPV infection can cause severe and diverse changes in gene expression in hemocytes. Cells belonging to the infection group mainly located at the early stage of the pseudotime trajectories. Furthermore, we found that BmNPV infection suppresses the immune response in the major hemocyte types. In summary, our scRNA-seq analysis revealed the diversity of silkworm hemocytes and provided a rich resource of gene expression profiles for a systems-level understanding of their functions in the uninfected condition and as a response to BmNPV.


Sign in / Sign up

Export Citation Format

Share Document