scholarly journals SHERRY2: A method for rapid and sensitive single cell RNA-seq

2021 ◽  
Author(s):  
Lin Di ◽  
Bo Liu ◽  
Yuzhu Lyu ◽  
Shihui Zhao ◽  
Yuhong Pang ◽  
...  

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.

Author(s):  
Jérémie Breda ◽  
Mihaela Zavolan ◽  
Erik van Nimwegen

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.


2019 ◽  
Author(s):  
Weida Wang ◽  
Jinyuan Xu ◽  
Shuyuan Wang ◽  
Peng Xia ◽  
Li Zhang ◽  
...  

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.


2016 ◽  
Author(s):  
Olivier Poirion ◽  
Xun Zhu ◽  
Travers Ching ◽  
Lana X. Garmire

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.


2019 ◽  
Author(s):  
Nicholas Bernstein ◽  
Nicole Fong ◽  
Irene Lam ◽  
Margaret Roy ◽  
David G. Hendrickson ◽  
...  

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.


2018 ◽  
Author(s):  
Kedar Nath Natarajan ◽  
Zhichao Miao ◽  
Miaomiao Jiang ◽  
Xiaoyun Huang ◽  
Hongpo Zhou ◽  
...  

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2020 ◽  
Author(s):  
Lin Li ◽  
Hao Dai ◽  
Zhaoyuan Fang ◽  
Luonan Chen

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.


2018 ◽  
Author(s):  
Luyi Tian ◽  
Jaring Schreuder ◽  
Daniela Zalcenstein ◽  
Jessica Tran ◽  
Nikolce Kocovski ◽  
...  

AbstractConventional single cell RNA-seq methods are destructive, such that a given cell cannot also then be tested for fate and function, without a time machine. Here, we develop a clonal method SIS-seq, whereby single cells are allowed to divide, and progeny cells are assayed separately in SISter conditions; some for fate, others by RNA-seq. By cross-correlating progenitor gene expression with mature cell fate within a clone, and doing this for many clones, we can identify the earliest gene expression signatures of dendritic cell subset development. SIS-seq could be used to study other populations harboring clonal heterogeneity, including stem, reprogrammed and cancer cells to reveal the transcriptional origins of fate decisions.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 1231-1231
Author(s):  
Chih Long Liu ◽  
Bo Dai ◽  
Aaron M. Newman ◽  
Ravi Majeti ◽  
Ash A Alizadeh

Abstract Abstract 1231 Background: Current methods for defining and isolating human hematopoietic stem and progenitor cells using surface markers enrich for unique functional properties of these populations. However, significant functional heterogeneity in these compartments remains with important implications for understanding normal and altered hematopoiesis. Using flow sorting to enrich >10,000 cells as progenitor subpopulations, we previously characterized the gene expression signature of normal human HSC (Majetiet al 2009 PNAS 106(9):3396–3401). We hypothesized that interrogation of the transcriptomes of single cells from this compartment could resolve remaining heterogeneity and help identify and better define features of progenitor cells and hematopoietic stem cells (HSCs). Methods: Using normal human bone marrow aspirates and a FACS Aria II instrument equipped with a specialized single-cell sorting apparatus, we sorted cells enriched for HSCs based on expression of Lin-CD34+CD38-CD90+CD45RA− into 1-cell, 10-cell, 100-cell, and 40000-cell (bulk) representations. We used at least 5 replicates per group and verified single cell deposition by direct visualization. We amplified cDNA from these corresponding inputs using an exponential whole transcriptome amplification (WTA) scheme (Miltenyi SuperAmp), and evaluated gene expression profiles by two microarray platforms (Agilent/GE Healthcare 60K, and Affymetrix U133 plus 2.0), and by RNA-Seq (Illumina). We used gene expression correlation between replicates within and between microarrays as means of assessing methodological reproducibility and estimating population heterogeneity. Results: Whole transcriptome amplification yielded cDNA ranging from 0.2–1 kb for 10 and 100 cells, with significantly lower size distribution of amplified cDNA observed for single cells. Gene expression profiles had significantly better replicate reproducibility and array coverage with the Agilent microarray platform when compared with the Affymetrix U133 Plus 2.0 platform (gene coverage of 84 % for 100 cells, 73 % for 10 cells and 50% for 1 cell for Agilent vs 24 % for 100 cells, 11 % for 10 cells and 5.7% for 1 cell for Affymetrix). RNA-Seq profiling of the same populations is ongoing with major technical optimizations focused on reducing amplification of non-human templates while maintaining library complexity and representation. Using biological replicates for each input size, we observed high inter-replicate correlation levels for expression profiles obtained for bulk sorted HSCs from 8 healthy donors (∼40000-cells, average r=0.97) and for 100-cell and 10-cell inputs from a single donor (r=0.96–0.99, respectively). While intra-array concordance of replicate measurements (n=14642) was high (r>0.91) within each of 5 single cells from a single donor, comparison of 5-single cells from the same donor identified significant heterogeneity, when compared to the 10-cell and 100-cell sub-clusters (Figure 1). Individual genes characteristically expressed by these heterogeneous single cell populations are currently being investigated by FACS and Fluidigm arrays. A larger experiment characterizing 192 single progenitor cells, employing Agilent microarrays and RNA-Seq is currently in progress. Conclusions: Single cell transcriptome profiling is feasible, with best performance on 60-mer microarrays. Single cell transcriptomes exhibit lower, but reasonable levels of reproducibility (r>0.7) and precision as compared with higher cell numbers. Gene expression profiles of single cells capture gene expression heterogeneity in HSCs. Disclosures: No relevant conflicts of interest to declare.


Author(s):  
Irene Papatheodorou ◽  
Pablo Moreno ◽  
Jonathan Manning ◽  
Alfonso Muñoz-Pomer Fuentes ◽  
Nancy George ◽  
...  

Abstract Expression Atlas is EMBL-EBI’s resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.


Sign in / Sign up

Export Citation Format

Share Document