Using Single-Cell RNA-Seq Data to Trace Tissue Cells Responsive to Thyroid Hormones

Thyroid hormones mediate a remarkable range of functions in many tissues and organ systems through the thyroid hormone receptors—THRA and THRB. Tissues and organs are composed of heterogeneous cells of different cell types. These different cell types have varying receptor expression abilities, which lead to variable responses in thyroid hormone regulation. The tissue-specific Thra and Thrb gene expression patterns help us understand the action of thyroid hormones at the tissue level. However, the situation becomes complicated if we wish to focus on tissues more closely to trace the responsive cells, which is a vital step in the process of understanding the molecular mechanism of diseases related to thyroid hormone regulation. Single-cell RNA sequencing technology is a powerful tool used to profile gene expression programs in individual cells. The Tabula Muris Consortium generates a single-cell transcriptomic atlas across the life span of Mus musculus that includes data from 23 tissues and organs. It provides an unprecedented opportunity to understand thyroid hormone regulation at the cell type resolution. We demonstrated the approaches that allow application of the single-cell RNA-Seq data generated by the Tabula Muris Consortium to trace responsive cells in tissues. First, employing the single-cell RNA-Seq data, we calculated the ability of different cell types to express Thra and Thrb, which direct us to the cell types sensitive to thyroid hormone regulation in tissues and organs. Next, using a cell clustering algorithm, we explored the subtypes with low Thra or Thrb expression within the different cell types and identified the potentially responsive cell subtypes. Finally, in the liver tissue treated with thyroid hormones, using the single-cell RNA-Seq data, we successfully traced the responsive cell types. We acknowledge that the computational predictions reported here need to be further validated using wet-lab experiments. However, we believe our results provide powerful information and will be beneficial for wet lab researchers.

Download Full-text

A United Statistical Framework for Single Cell and Bulk Sequencing Data

10.1101/206532 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Accurate Estimation ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types ◽

Cell Data

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

Download Full-text

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Download Full-text

Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq

eLife ◽

10.7554/elife.43803 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 37

Author(s):

Dylan Kotliar ◽

Adrian Veres ◽

M Aurel Nagy ◽

Shervin Tabrizi ◽

Eran Hodis ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Matrix Factorization ◽

Cell Types ◽

Environmental Cues ◽

Rna Seq ◽

Cell Type ◽

Type Identity ◽

Brain Organoid ◽

Non Negative Matrix Factorization

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.

Download Full-text

scAPAdb: a comprehensive database of alternative polyadenylation at single-cell resolution

Nucleic Acids Research ◽

10.1093/nar/gkab795 ◽

2021 ◽

Author(s):

Sheng Zhu ◽

Qiwei Lian ◽

Wenbin Ye ◽

Wei Qin ◽

Zhe Wu ◽

...

Keyword(s):

Single Cell ◽

Alternative Polyadenylation ◽

Cell Types ◽

Single Cell Level ◽

Cell Heterogeneity ◽

Rna Seq ◽

Cell Level ◽

Eukaryotic Gene ◽

User Friendly ◽

Different Cell Types

Abstract Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3′-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.

Download Full-text

Computational approaches towards reducing contamination in single-cell RNA-seq data

10.1101/2020.07.15.205062 ◽

2020 ◽

Author(s):

Siamak Yousefi ◽

Hao Chen ◽

Jesse F. Ingels ◽

Melinda S. McCarty ◽

Arthur G. Centeno ◽

...

Keyword(s):

Single Cell ◽

Single Cells ◽

Real Life ◽

Cell Types ◽

Cell Capture ◽

Rna Seq ◽

Sequence Analyses ◽

Cell Functions ◽

Biological Interpretation ◽

Different Cell Types

SUMMARYSingle cell RNA sequencing has enabled quantification of single cells and identification of different cell types and subtypes as well as cell functions in different tissues. Single cell RNA sequence analyses assume acquired RNAs correspond to cells, however, RNAs from contamination within the input data are also captured by these assays. The sequencing of background contamination as well as unwanted cells making their way to the final assay Potentially confound the correct biological interpretation of single cell transcriptomic data. Here we demonstrate two approaches to deal with background contamination as well as profiling of unwanted cells in the assays. We use three real-life datasets of whole-cell capture and nucleotide single-cell captures generated by Fluidigm and 10x technologies and show that these methods reduce the effect of contamination, strengthen clustering of cells and improves biological interpretation.

Download Full-text

MarkerCount: A stable, count-based cell type identifier for single cell RNA-Seq experiments

10.21203/rs.3.rs-418249/v1 ◽

2021 ◽

Author(s):

Hanbyeol Kim ◽

Joongho Lee ◽

Keunsoo Kang ◽

Seokhyun Yoon

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Batch Effect ◽

Expression Level ◽

Rna Seq ◽

Cell Type ◽

Stable Performance ◽

Downstream Analysis

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.

Download Full-text

Identifying Gene Expression Programs of Cell-type Identity and Cellular Activity with Single-Cell RNA-Seq

10.1101/310599 ◽

2018 ◽

Cited By ~ 7

Author(s):

Dylan Kotliar ◽

Adrian Veres ◽

M. Aurel Nagy ◽

Shervin Tabrizi ◽

Eran Hodis ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Matrix Factorization ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Relative Contribution ◽

Neuronal Synapses ◽

Type Identity ◽

Brain Organoid

AbstractIdentifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here we illustrate and enhance the use of matrix factorization as a solution to this problem. We show with simulations that a method that we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including the relative contribution of programs in each cell. Applied to published brain organoid and visual cortex scRNA-Seq datasets, cNMF refines the hierarchy of cell types and identifies both expected (e.g. cell cycle and hypoxia) and intriguing novel activity programs. We propose that one of the novel programs may reflect a neurosecretory phenotype and a second may underlie the formation of neuronal synapses. We make cNMF available to the community and illustrate how this approach can provide key insights into gene expression variation within and between cell types.

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

DSAVE: Detection of misclassified cells in single-cell RNA-Seq data

PLoS ONE ◽

10.1371/journal.pone.0243360 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243360

Author(s):

Johan Gustafsson ◽

Jonathan Robinson ◽

Juan S. Inda-Díaz ◽

Elias Björnson ◽

Rebecka Jörnsten ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Log Likelihood ◽

Single Cell Rna Sequencing ◽

Cell Transcriptome ◽

Average Gene ◽

Single Cell Transcriptome

Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.

Download Full-text

Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data

10.1101/2020.08.26.267815 ◽

2020 ◽

Author(s):

Jiaxin Fan ◽

Xuran Wang ◽

Rui Xiao ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Allelic Expression ◽

Rna Seq ◽

Allelic Expression Imbalance ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Different Cell Types

AbstractAllelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provided a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases.Author SummaryDetection of allelic expression imbalance (AEI), a phenomenon where the two alleles of a gene differ in their expression magnitude, is a key step towards the understanding of phenotypic variations among individuals. Existing methods detect AEI use bulk RNA sequencing (RNA-seq) data and ignore AEI variations among different cell types. Although single-cell RNA sequencing (scRNA-seq) has enabled the characterization of cell-to-cell heterogeneity in gene expression, the high costs have limited its application in AEI analysis. To overcome this limitation, we developed BSCET to characterize cell-type-specific AEI using the widely available bulk RNA-seq data by integrating cell-type composition information inferred from scRNA-seq samples. Since the degree of AEI may vary with disease phenotypes, we further extended BSCET to detect genes whose cell-type-specific AEIs are associated with clinical factors. Through extensive benchmark evaluations and analyses of two pancreatic islet bulk RNA-seq datasets, we demonstrated BSCET’s ability to refine bulk-level AEI to cell-type resolution, and to identify genes whose cell-type-specific AEIs are associated with the progression of type 2 diabetes. With the vast amount of easily accessible bulk RNA-seq data, we believe BSCET will be a valuable tool for elucidating cell type contributions in human diseases.

Download Full-text