CCSN: Single Cell RNA Sequencing Data Analysis by Conditional Cell-specific Network

AbstractThe rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. However, compared with bulk RNA sequencing (RNA-seq), single-cell RNA-seq (scRNA-seq) suffers from higher noise and lower coverage, which brings new computational difficulties. Based on statistical independence, cell-specific network (CSN) is able to quantify the overall associations between genes for each cell, yet suffering from a problem of overestimation related to indirect effects. To overcome this problem, we propose the “conditional cell-specific network” (CCSN) method, which can measure the direct associations between genes by eliminating the indirect associations. CCSN can be used for cell clustering and dimension reduction on a network basis of single cells. Intuitively, each CCSN can be viewed as the transformation from less “reliable” gene expression to more “reliable” gene-gene associations in a cell. Based on CCSN, we further design network flow entropy (NFE) to estimate the differentiation potency of a single cell. A number of scRNA-seq datasets were used to demonstrate the advantages of our approach: (1) one direct association network for one cell; (2) most existing scRNA-seq methods designed for gene expression matrices are also applicable to CCSN-transformed degree matrices; (3) CCSN-based NFE helps resolving the direction of differentiation trajectories by quantifying the potency of each cell. CCSN is publicly available at http://sysbio.sibcb.ac.cn/cb/chenlab/soft/CCSN.zip.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single cell RNA sequencing data

10.1101/677740 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Empirical Distribution ◽

Supplementary Information ◽

Rna Seq ◽

Sequencing Data ◽

Actual Distribution ◽

Wide Range ◽

Single Cell Rna Sequencing

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.

Download Full-text

Machine Learning-Assisted Identification of Factors Contributing to the Technical Variability Between Bulk and Single-Cell RNA-Seq Experiments

10.21203/rs.3.rs-1247889/v1 ◽

2022 ◽

Author(s):

Sofya Lipnitskaya ◽

Yang Shen ◽

Stefan Legewie ◽

Holger Klein ◽

Kolja Becker

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Single Cell ◽

Rna Sequencing ◽

Quantitative Difference ◽

Rna Seq ◽

Sequencing Data ◽

Factors Affecting ◽

Expression Variability ◽

Technical Variability

Abstract Background: Recent studies in the area of transcriptomics performed on single-cell and population levels reveal noticeable variability in gene expression measurements provided by different RNA sequencing technologies. Due to increased noise and complexity of single-cell RNA-Seq (scRNA-Seq) data over the bulk experiment, there is a substantial number of variably-expressed genes and so-called dropouts, challenging the subsequent computational analysis and potentially leading to false positive discoveries. In order to investigate factors affecting technical variability between RNA sequencing experiments of different technologies, we performed a systematic assessment of single-cell and bulk RNA-Seq data, which have undergone the same pre-processing and sample preparation procedures. Results: Our analysis indicates that variability between gene expression measurements as well as dropout events are not exclusively caused by biological variability, low expression levels, or random variation. Furthermore, we propose FAVSeq, a machine learning-assisted pipeline for detection of factors contributing to gene expression variability in matched RNA-Seq data provided by two technologies. Based on the analysis of the matched bulk and single-cell dataset, we found the 3'-UTR and transcript lengths as the most relevant effectors of the observed variation between RNA-Seq experiments, while the same factors together with cellular compartments were shown to be associated with dropouts. Conclusions: Here, we investigated the sources of variation in RNA-Seq profiles of matched single-cell and bulk experiments. In addition, we proposed the FAVSeq pipeline for analyzing multimodal RNA sequencing data, which allowed to identify factors affecting quantitative difference in gene expression measurements as well as the presence of dropouts. Hereby, the derived knowledge can be employed further in order to improve the interpretation of RNA-Seq data and identify genes that can be affected by assay-based deviations. Source code is available under the MIT license at https://github.com/slipnitskaya/FAVSeq.

Download Full-text

LTMG: A novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data

10.1101/430009 ◽

2018 ◽

Cited By ~ 1

Author(s):

Changlin Wan ◽

Wennan Chang ◽

Yu Zhang ◽

Fenil Shah ◽

Xiaoyu Lu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cells ◽

Cell Types ◽

R Package ◽

Data Sets ◽

Rna Seq ◽

Cell Functions ◽

Transcriptional Regulatory ◽

A Cell

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.

Download Full-text

SMARTer single cell total RNA sequencing

10.1101/430090 ◽

2018 ◽

Cited By ~ 1

Author(s):

Verboom Karen ◽

Everaert Celine ◽

Bolduc Nathalie ◽

Livak J. Kenneth ◽

Yigit Nurten ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Transcript Level ◽

Cellular Heterogeneity ◽

Circular Rnas ◽

Rna Seq ◽

Sequencing Data ◽

Total Rna ◽

Sequencing Experiment

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.

Download Full-text

A Scalable Strand-Specific Protocol Enabling Full-Length Total RNA Sequencing From Single Cells

Frontiers in Genetics ◽

10.3389/fgene.2021.665888 ◽

2021 ◽

Vol 12 ◽

Author(s):

Simon Haile ◽

Richard D. Corbett ◽

Veronique G. LeBlanc ◽

Lisa Wei ◽

Stephen Pleasance ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Single Cells ◽

Cell Types ◽

Full Length ◽

Sequencing Data ◽

Total Rna ◽

Specific Protocol

RNA sequencing (RNAseq) has been widely used to generate bulk gene expression measurements collected from pools of cells. Only relatively recently have single-cell RNAseq (scRNAseq) methods provided opportunities for gene expression analyses at the single-cell level, allowing researchers to study heterogeneous mixtures of cells at unprecedented resolution. Tumors tend to be composed of heterogeneous cellular mixtures and are frequently the subjects of such analyses. Extensive method developments have led to several protocols for scRNAseq but, owing to the small amounts of RNA in single cells, technical constraints have required compromises. For example, the majority of scRNAseq methods are limited to sequencing only the 3′ or 5′ termini of transcripts. Other protocols that facilitate full-length transcript profiling tend to capture only polyadenylated mRNAs and are generally limited to processing only 96 cells at a time. Here, we address these limitations and present a novel protocol that allows for the high-throughput sequencing of full-length, total RNA at single-cell resolution. We demonstrate that our method produced strand-specific sequencing data for both polyadenylated and non-polyadenylated transcripts, enabled the profiling of transcript regions beyond only transcript termini, and yielded data rich enough to allow identification of cell types from heterogeneous biological samples.

Download Full-text

Single-Cell RNA Sequencing of Batch Chlamydomonas Cultures Reveals Heterogeneity in their Diurnal Cycle Phase

The Plant Cell ◽

10.1093/plcell/koab025 ◽

2021 ◽

Author(s):

Feiyang Ma ◽

Patrice A Salomé ◽

Sabeeha S Merchant ◽

Matteo Pellegrini

Keyword(s):

Cell Wall ◽

Single Cell ◽

Rna Sequencing ◽

Environmental Changes ◽

Single Cells ◽

Nitrogen Deficiency ◽

Cycle Phase ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

A Cell

Abstract The photosynthetic unicellular alga Chlamydomonas (Chlamydomonas reinhardtii) is a versatile reference for algal biology because of its ease of culture in the laboratory. Genomic and systems biology approaches have previously described transcriptome responses to environmental changes using bulk data, thus representing the average behavior from pools of cells. Here, we apply single-cell RNA sequencing (scRNA-seq) to probe the heterogeneity of Chlamydomonas cell populations under three environments and in two genotypes differing by the presence of a cell wall. First, we determined that RNA can be extracted from single algal cells with or without a cell wall, offering the possibility to sample natural algal communities. Second, scRNA-seq successfully separated single cells into non-overlapping cell clusters according to their growth conditions. Cells exposed to iron or nitrogen deficiency were easily distinguished despite a shared tendency to arrest photosynthesis and cell division to economize resources. Notably, these groups of cells recapitulated known patterns observed with bulk RNA-seq, but also revealed their inherent heterogeneity. A substantial source of variation between cells originated from their endogenous diurnal phase, although cultures were grown in constant light. We exploited this result to show that circadian iron responses may be conserved from algae to land plants. We document experimentally that bulk RNA-seq data represent an average of typically hidden heterogeneity in the population.

Download Full-text

Cell Dissociation from Butterfly Pupal Wing Tissues for Single-Cell RNA Sequencing

Methods and Protocols ◽

10.3390/mps3040072 ◽

2020 ◽

Vol 3 (4) ◽

pp. 72

Author(s):

Anupama Prakash ◽

Antónia Monteiro

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Expression Patterns ◽

Cell Types ◽

Rna Seq ◽

Bicyclus Anynana ◽

Single Cell Sequencing ◽

Pupal Wing

Butterflies are well known for their beautiful wings and have been great systems to understand the ecology, evolution, genetics, and development of patterning and coloration. These color patterns are mosaics on the wing created by the tiling of individual units called scales, which develop from single cells. Traditionally, bulk RNA sequencing (RNA-seq) has been used extensively to identify the loci involved in wing color development and pattern formation. RNA-seq provides an averaged gene expression landscape of the entire wing tissue or of small dissected wing regions under consideration. However, to understand the gene expression patterns of the units of color, which are the scales, and to identify different scale cell types within a wing that produce different colors and scale structures, it is necessary to study single cells. This has recently been facilitated by the advent of single-cell sequencing. Here, we provide a detailed protocol for the dissociation of cells from Bicyclus anynana pupal wings to obtain a viable single-cell suspension for downstream single-cell sequencing. We outline our experimental design and the use of fluorescence-activated cell sorting (FACS) to obtain putative scale-building and socket cells based on size. Finally, we discuss some of the current challenges of this technique in studying single-cell scale development and suggest future avenues to address these challenges.

Download Full-text

Missing Data and Technical Variability in Single-Cell RNA- Sequencing Experiments

10.1101/025528 ◽

2015 ◽

Cited By ~ 32

Author(s):

Stephanie C Hicks ◽

F. William Townes ◽

Mingxiang Teng ◽

Rafael A Irizarry

Keyword(s):

Gene Expression ◽

Missing Data ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cells ◽

Systematic Errors ◽

Gene Expression Measurement ◽

Rna Seq ◽

Batch Effects

Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-Seq and scRNA-seq data are markedly different. In particular, unlike RNA-Seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, gene expressing RNA, but not at a sufficient level to detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.

Download Full-text

Single-Cell RNA Sequencing of Batch Chlamydomonas Cultures Reveals Heterogeneity in their Diurnal Cycle Phase

10.1101/2020.09.15.298844 ◽

2020 ◽

Cited By ~ 1

Author(s):

Feiyang Ma ◽

Patrice A. Salomé ◽

Sabeeha S. Merchant ◽

Matteo Pellegrini

Keyword(s):

Cell Wall ◽

Single Cell ◽

Rna Sequencing ◽

Diurnal Cycle ◽

Single Cells ◽

Nitrogen Deficiency ◽

Cycle Phase ◽

Rna Seq ◽

Single Cell Rna Sequencing ◽

A Cell

ABSTRACTThe photosynthetic unicellular alga Chlamydomonas (Chlamydomonas reinhardtii) is a versatile reference for algal biology because of the facility with which it can be cultured in the laboratory. Genomic and systems biology approaches have previously been used to describe how the transcriptome responds to environmental changes, but this analysis has been limited to bulk data, representing the average behavior from pools of cells. Here, we apply single-cell RNA sequencing (scRNA-seq) to probe the heterogeneity of Chlamydomonas cell populations under three environments and in two genotypes differing in the presence of a cell wall. First, we determined that RNA can be extracted from single algal cells with or without a cell wall, offering the possibility to sample algae communities in the wild. Second, scRNA-seq successfully separated single cells into non-overlapping cell clusters according to their growth conditions. Cells exposed to iron or nitrogen deficiency were easily distinguished despite a shared tendency to arrest cell division to economize resources. Notably, these groups of cells recapitulated known patterns observed with bulk RNA-seq, but also revealed their inherent heterogeneity. A substantial source of variation between cells originated from their endogenous diurnal phase, although cultures were grown in constant light. We exploited this result to show that circadian iron responses may be conserved from algae to land plants. We propose that bulk RNA-seq data represent an average of varied cell states that hides underappreciated heterogeneity.One-sentence summaryWe show that single-cell RNA-seq (scRNA-seq) can be applied to Chlamydomonas cultures to reveal the that heterogenity in bulk cultures is largely driven by diurnal cycle phasesThe author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Matteo Pellegrini ([email protected])

Download Full-text

QUBIC2: A novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis

10.1101/409961 ◽

2018 ◽

Cited By ~ 5

Author(s):

Juan Xie ◽

Anjun Ma ◽

Yu Zhang ◽

Bingqiang Liu ◽

Changlin Wan ◽

...

Keyword(s):

Gene Expression ◽

Transcriptional Regulation ◽

Single Cell ◽

Rna Sequencing ◽

Spatial Data ◽

Large Scale ◽

Biological Information ◽

Superior Performance ◽

Rna Seq ◽

Sequencing Data

ABSTRACTThe combination of biclustering and large-scale gene expression data holds a promising potential for inference of the condition specific functional pathways/networks. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-sequencing (RNA-Seq) data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, e.g., the massive zeros or lowly expressed genes in the data, especially for single-cell RNA-Seq (scRNA-Seq) data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. Here we presented a novel biclustering algorithm namely QUBIC2, for the analysis of large-scale bulk RNA-Seq and scRNA-Seq data. Key novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression, (ii) adopted the mixture Gaussian distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes, (iii) utilized a Core-Dual strategy to identify biclusters and optimize relevant parameters, and (iv) developed a size-based P-value framework to evaluate the statistical significances of all the identified biclusters. Our method validation on comprehensive data sets of bulk and single cell RNA-seq data suggests that QUBIC2 had superior performance in functional modules detection and cell type classification compared with the other five widely-used biclustering tools. In addition, the applications of temporal and spatial data demonstrated that QUBIC2 can derive meaningful biological information from scRNA-Seq data. The source code for QUBIC2 can be freely accessed at https://github.com/maqin2001/qubic2.

Download Full-text