Comparative analysis of sequencing technologies platforms for single-cell transcriptomics

2018 ◽  
Author(s):  
Kedar Nath Natarajan ◽  
Zhichao Miao ◽  
Miaomiao Jiang ◽  
Xiaoyun Huang ◽  
Hongpo Zhou ◽  
...  

AbstractAll single-cell RNA-seq protocols and technologies require library preparation prior to sequencing on a platform such as Illumina. Here, we present the first report to utilize the BGISEQ-500 platform for scRNA-seq, and compare the sensitivity and accuracy to Illumina sequencing. We generate a scRNA-seq resource of 468 unique single-cells and 1,297 matched single cDNA samples, performing SMARTer and Smart-seq2 protocols on mESCs and K562 cells with RNA spike-ins. We sequence these libraries on both BGISEQ-500 and Illumina HiSeq platforms using single- and paired-end reads. The two platforms have comparable sensitivity and accuracy in terms of quantification of gene expression, and low technical variability. Our study provides a standardised scRNA-seq resource to benchmark new scRNA-seq library preparation protocols and sequencing platforms.


2021 ◽  
Author(s):  
Lin Di ◽  
Bo Liu ◽  
Yuzhu Lyu ◽  
Shihui Zhao ◽  
Yuhong Pang ◽  
...  

Many single cell RNA-seq applications aim to probe a wide dynamic range of gene expression, but most of them are still challenging to accurately quantify low-aboundance transcripts. Based on our previous finding that Tn5 transposase can directly cut-and-tag DNA/RNA hetero-duplexes, we present SHERRY2, an optimized protocol for sequencing transcriptomes of single cells or single nuclei. SHERRY2 is robust and scalable, and it has higher sensitivity and more uniform coverage in comparison with prevalent scRNA-seq methods. With throughput of a few thousand cells per batch, SHERRY2 can reveal the subtle transcriptomic differences between cells and facilitate important biological discoveries.



Author(s):  
Jérémie Breda ◽  
Mihaela Zavolan ◽  
Erik van Nimwegen

AbstractIn spite of a large investment in the development of methodologies for analysis of single-cell RNA-seq data, there is still little agreement on how to best normalize such data, i.e. how to quantify gene expression states of single cells from such data. Starting from a few basic requirements such as that inferred expression states should correct for both intrinsic biological fluctuations and measurement noise, and that changes in expression state should be measured in terms of fold-changes rather than changes in absolute levels, we here derive a unique Bayesian procedure for normalizing single-cell RNA-seq data from first principles. Our implementation of this normalization procedure, called Sanity (SAmpling Noise corrected Inference of Transcription activitY), estimates log expression values and associated errors bars directly from raw UMI counts without any tunable parameters.Comparison of Sanity with other recent normalization methods on a selection of scRNA-seq datasets shows that Sanity outperforms other methods on basic downstream processing tasks such as clustering cells into subtypes and identification of differentially expressed genes. More importantly, we show that all other normalization methods present severely distorted pictures of the data. By failing to account for biological and technical Poisson noise, many methods systematically predict the lowest expressed genes to be most variable in expression, whereas in reality these genes provide least evidence of true biological variability. In addition, by confounding noise removal with lower-dimensional representation of the data, many methods introduce strong spurious correlations of expression levels with the total UMI count of each cell as well as spurious co-expression of genes.



2019 ◽  
Author(s):  
Weida Wang ◽  
Jinyuan Xu ◽  
Shuyuan Wang ◽  
Peng Xia ◽  
Li Zhang ◽  
...  

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.



2016 ◽  
Author(s):  
Olivier Poirion ◽  
Xun Zhu ◽  
Travers Ching ◽  
Lana X. Garmire

AbstractDespite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We developed a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship. The method SSrGE is available at https://github.com/lanagarmire/SSrGE.



2021 ◽  
Author(s):  
Amanda Raine ◽  
Anders Lundmark ◽  
Alva Annett ◽  
Ann-Christin Wiman ◽  
Marco Cavalli ◽  
...  

DNA methylation is a central epigenetic mark that has diverse roles in gene regulation, development, and maintenance of genome integrity. 5 methyl cytosine (5mC) can be interrogated at base resolution in single cells by using bisulfite sequencing (scWGBS). Several different scWGBS strategies have been described in recent years to study DNA methylation in single cells. However, there remain limitations with respect to cost-efficiency and yield. Herein, we present a new development in the field of scWGBS library preparation; single cell Splinted Ligation Adapter Tagging (scSPLAT). scSPLAT employs a pooling strategy to facilitate sample preparation at a higher scale and throughput than previously possible. We demonstrate the accuracy and robustness of the method by generating data from 225 single K562 cells and from 309 single liver nuclei and compare scSPLAT against other scWGBS methods.



2019 ◽  
Author(s):  
Nicholas Bernstein ◽  
Nicole Fong ◽  
Irene Lam ◽  
Margaret Roy ◽  
David G. Hendrickson ◽  
...  

AbstractSingle cell RNA-seq (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these “doublets” violate the fundamental premise of single cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells beyond any previous approach.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractA growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present , a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets.



2013 ◽  
Vol 45 (8) ◽  
pp. 301-311 ◽  
Author(s):  
Richard H. Chapple ◽  
Polyana C. Tizioto ◽  
Kevin D. Wells ◽  
Scott A. Givan ◽  
JaeWoo Kim ◽  
...  

Gene regulation and transcriptome studies have been enabled by the development of RNA-Seq applications for high-throughput sequencing platforms. Next generation sequencing is remarkably efficient and avoids many issues inherent in hybridization-based microarray methodologies including the exon-specific dependence of probe design. Biologically relevant transcripts including messenger and regulatory RNAs may now be quantified and annotated regardless of whether they have previously been observed. We used RNA-Seq to investigate global patterns of gene expression in early developing rat liver. Liver samples from timed-pregnant Lewis rats were collected at six fetal and neonatal stages [embryonic day (E)14, E16, E18, E20, postnatal day (P)1, P7], transcripts were sequenced using an Illumina HiSeq 2000, and data analysis was performed with the Tuxedo software suite. Genes and isoforms differing in abundance were queried for enrichment within functionally related gene groups using the Functional Annotation Tool of the DAVID Bioinformatics Database. While hematopoietic gene expression is initiated by E14, hepatocyte maturation is a gradual process involving clusters of genes responsible for response to nutrients and enzymes responsible for glycolysis and fatty acid catabolism. Following birth, a large cluster of differentially abundant genes was enriched for mitochondrial gene expression and cholesterol synthesis indicating that by 1 wk of age, the liver is engaged in lipid sensing and bile production. Clustering results for differentially abundant genes and isoforms were similar with the greatest difference for the E14/E16 comparison. Finally, a bioinformatic approach was used to annotate 1,307 novel liver transcripts assembled from sequences that aligned to intergenic regions of the rat genome.



2022 ◽  
Author(s):  
Sofya Lipnitskaya ◽  
Yang Shen ◽  
Stefan Legewie ◽  
Holger Klein ◽  
Kolja Becker

Abstract Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.



2017 ◽  
Author(s):  
Eduardo Torre ◽  
Hannah Dueck ◽  
Sydney Shaffer ◽  
Janko Gospocic ◽  
Rohit Gupte ◽  
...  

AbstractThe development of single cell RNA sequencing technologies has emerged as a powerful means of profiling the transcriptional behavior of single cells, leveraging the breadth of sequencing measurements to make inferences about cell type. However, there is still little understanding of how well these methods perform at measuring single cell variability for small sets of genes and what “transcriptome coverage” (e.g. genes detected per cell) is needed for accurate measurements. Here, we use single molecule RNA FISH measurements of 26 genes in thousands of melanoma cells to provide an independent reference dataset to assess the performance of the DropSeq and Fluidigm single cell RNA sequencing platforms. We quantified the Gini coefficient, a measure of rare-cell expression variability, and find that the correspondence between RNA FISH and single cell RNA sequencing for Gini, unlike for mean, increases markedly with per-cell library complexity up to a threshold of ∼2000 genes detected. A similar complexity threshold also allows for robust assignment of multi-genic cell states such as cell cycle phase. Our results provide guidelines for selecting sequencing depth and complexity thresholds for single cell RNA sequencing. More generally, our results suggest that if the number of genes whose expression levels are required to answer any given biological question is small, then greater transcriptome complexity per cell is likely more important than obtaining very large numbers of cells.



Sign in / Sign up

Export Citation Format

Share Document