Using Cell Type–Specific Genes to Identify Cell-Type Transitions Between Different in vitro Culture Conditions

In vitro differentiation or expansion of stem and progenitor cells under chemical stimulation or genetic manipulation is used for understanding the molecular mechanisms of cell differentiation and self-renewal. However, concerns around the cell identity of in vitro–cultured cells exist. Bioinformatics methods, which rely heavily on signatures of cell types, have been developed to estimate cell types in bulk samples. The Tabula Muris Senis project provides an important basis for the comprehensive identification of signatures for different cell types. Here, we identified 46 cell type–specific (CTS) gene clusters for 83 mouse cell types. We conducted Gene Ontology term enrichment analysis on the gene clusters and revealed the specific functions of the relevant cell types. Next, we proposed a simple method, named CTSFinder, to identify different cell types between bulk RNA-Seq samples using the 46 CTS gene clusters. We applied CTSFinder on bulk RNA-Seq data from 17 organs and from developing mouse liver over different stages. We successfully identified the specific cell types between organs and captured the dynamics of different cell types during liver development. We applied CTSFinder with bulk RNA-Seq data from a growth factor–induced neural progenitor cell culture system and identified the dynamics of brain immune cells and nonimmune cells during the long-time cell culture. We also applied CTSFinder with bulk RNA-Seq data from reprogramming induced pluripotent stem cells and identified the stage when those cells were massively induced. Finally, we applied CTSFinder with bulk RNA-Seq data from in vivo and in vitro developing mouse retina and captured the dynamics of different cell types in the two development systems. The CTS gene clusters and CTSFinder method could thus serve as promising toolkits for assessing the cell identity of in vitro culture systems.

Download Full-text

Single-nucleus RNA-seq reveals dysregulation of striatal cell identity due to Huntington’s disease mutations

10.1101/2020.07.08.192880 ◽

2020 ◽

Author(s):

Sonia Malaiya ◽

Marcia Cortes-Gutierrez ◽

Brian R. Herb ◽

Sydney R. Coffey ◽

Samuel R.W. Legg ◽

...

Keyword(s):

Huntington's Disease ◽

Huntington’S Disease ◽

Neurodegenerative Disorder ◽

Cell Types ◽

Transcriptional Networks ◽

Rna Seq ◽

Cell Type ◽

Cell Identity ◽

Transcriptional Changes ◽

Cell Type Specific

ABSTRACTHuntington’s disease (HD) is a dominantly inherited neurodegenerative disorder caused by a trinucleotide expansion in exon 1 of the huntingtin (Htt) gene. Cell death in HD occurs primarily in striatal medium spiny neurons (MSNs), but the involvement of specific MSN subtypes and of other striatal cell types remains poorly understood. To gain insight into cell type-specific disease processes, we studied the nuclear transcriptomes of 4,524 cells from the striatum of a genetically precise knock-in mouse model of the HD mutation, HttQ175/+, and from wildtype controls. We used 14-15-month-old mice, a time point roughly equivalent to an early stage of symptomatic human disease. Cell type distributions indicated selective loss of D2 MSNs and increased microglia in aged HttQ175/+ mice. Thousands of differentially expressed genes were distributed across most striatal cell types, including transcriptional changes in glial populations that are not apparent from RNA-seq of bulk tissue. Reconstruction of cell typespecific transcriptional networks revealed a striking pattern of bidirectional dysregulation for many cell type-specific genes. Typically, these genes were repressed in their primary cell type, yet de-repressed in other striatal cell types. Integration with existing epigenomic and transcriptomic data suggest that partial loss-of-function of the Polycomb Repressive Complex 2 (PRC2) may underlie many of these transcriptional changes, leading to deficits in the maintenance of cell identity across virtually all cell types in the adult striatum.

Download Full-text

Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data

10.1101/2020.08.26.267815 ◽

2020 ◽

Author(s):

Jiaxin Fan ◽

Xuran Wang ◽

Rui Xiao ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Allelic Expression ◽

Rna Seq ◽

Allelic Expression Imbalance ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Different Cell Types

AbstractAllelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provided a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases.Author SummaryDetection of allelic expression imbalance (AEI), a phenomenon where the two alleles of a gene differ in their expression magnitude, is a key step towards the understanding of phenotypic variations among individuals. Existing methods detect AEI use bulk RNA sequencing (RNA-seq) data and ignore AEI variations among different cell types. Although single-cell RNA sequencing (scRNA-seq) has enabled the characterization of cell-to-cell heterogeneity in gene expression, the high costs have limited its application in AEI analysis. To overcome this limitation, we developed BSCET to characterize cell-type-specific AEI using the widely available bulk RNA-seq data by integrating cell-type composition information inferred from scRNA-seq samples. Since the degree of AEI may vary with disease phenotypes, we further extended BSCET to detect genes whose cell-type-specific AEIs are associated with clinical factors. Through extensive benchmark evaluations and analyses of two pancreatic islet bulk RNA-seq datasets, we demonstrated BSCET’s ability to refine bulk-level AEI to cell-type resolution, and to identify genes whose cell-type-specific AEIs are associated with the progression of type 2 diabetes. With the vast amount of easily accessible bulk RNA-seq data, we believe BSCET will be a valuable tool for elucidating cell type contributions in human diseases.

Download Full-text

Detecting cell-type-specific allelic expression imbalance by integrative analysis of bulk and single-cell RNA sequencing data

PLoS Genetics ◽

10.1371/journal.pgen.1009080 ◽

2021 ◽

Vol 17 (3) ◽

pp. e1009080

Author(s):

Jiaxin Fan ◽

Xuran Wang ◽

Rui Xiao ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cell Types ◽

Allelic Expression ◽

Rna Seq ◽

Allelic Expression Imbalance ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Different Cell Types

Allelic expression imbalance (AEI), quantified by the relative expression of two alleles of a gene in a diploid organism, can help explain phenotypic variations among individuals. Traditional methods detect AEI using bulk RNA sequencing (RNA-seq) data, a data type that averages out cell-to-cell heterogeneity in gene expression across cell types. Since the patterns of AEI may vary across different cell types, it is desirable to study AEI in a cell-type-specific manner. Although this can be achieved by single-cell RNA sequencing (scRNA-seq), it requires full-length transcript to be sequenced in single cells of a large number of individuals, which are still cost prohibitive to generate. To overcome this limitation and utilize the vast amount of existing disease relevant bulk tissue RNA-seq data, we developed BSCET, which enables the characterization of cell-type-specific AEI in bulk RNA-seq data by integrating cell type composition information inferred from a small set of scRNA-seq samples, possibly obtained from an external dataset. By modeling covariate effect, BSCET can also detect genes whose cell-type-specific AEI are associated with clinical factors. Through extensive benchmark evaluations, we show that BSCET correctly detected genes with cell-type-specific AEI and differential AEI between healthy and diseased samples using bulk RNA-seq data. BSCET also uncovered cell-type-specific AEIs that were missed in bulk data analysis when the directions of AEI are opposite in different cell types. We further applied BSCET to two pancreatic islet bulk RNA-seq datasets, and detected genes showing cell-type-specific AEI that are related to the progression of type 2 diabetes. Since bulk RNA-seq data are easily accessible, BSCET provided a convenient tool to integrate information from scRNA-seq data to gain insight on AEI with cell type resolution. Results from such analysis will advance our understanding of cell type contributions in human diseases.

Download Full-text

Iterative point set registration for aligning scRNA-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1007939 ◽

2020 ◽

Vol 16 (10) ◽

pp. e1007939

Author(s):

Amir Alavi ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Image Data ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Point Set Registration ◽

Optimization Function ◽

Cell Type Specific ◽

Point Set ◽

Different Cell Types

Several studies profile similar single cell RNA-Seq (scRNA-Seq) data using different technologies and platforms. A number of alignment methods have been developed to enable the integration and comparison of scRNA-Seq data from such studies. While each performs well on some of the datasets, to date no method was able to both perform the alignment using the original expression space and generalize to new data. To enable such analysis we developed Single Cell Iterative Point set Registration (SCIPR) which extends methods that were successfully applied to align image data to scRNA-Seq. We discuss the required changes needed, the resulting optimization function, and algorithms for learning a transformation function for aligning data. We tested SCIPR on several scRNA-Seq datasets. As we show it successfully aligns data from several different cell types, improving upon prior methods proposed for this task. In addition, we show the parameters learned by SCIPR can be used to align data not used in the training and to identify key cell type-specific genes.

Download Full-text

A United Statistical Framework for Single Cell and Bulk Sequencing Data

10.1101/206532 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Accurate Estimation ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types ◽

Cell Data

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.1101/2020.05.13.093997 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario B. Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Dna Sequences ◽

Cell Types ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

Cell Type Specific ◽

Different Cell Types

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

CellMap: Characterizing the types and composition of iPSC-derived cells from RNA-seq data

10.1101/2021.05.24.445360 ◽

2021 ◽

Author(s):

Zhengyu Ouyang ◽

Nathanael Bourgeois ◽

Eugenia Lyashenko ◽

Paige Cundiff ◽

Patrick F Cullen ◽

...

Keyword(s):

Single Cell ◽

Induced Pluripotent Stem Cell ◽

Cell Types ◽

Model Systems ◽

Rna Seq ◽

Cell Type ◽

Fine Grained ◽

Single Nucleus ◽

Induced Pluripotent

Induced pluripotent stem cell (iPSC) derived cell types are increasingly employed as in vitro model systems for drug discovery. For these studies to be meaningful, it is important to understand the reproducibility of the iPSC-derived cultures and their similarity to equivalent endogenous cell types. Single-cell and single-nucleus RNA sequencing (RNA-seq) are useful to gain such understanding, but they are expensive and time consuming, while bulk RNA-seq data can be generated quicker and at lower cost. In silico cell type decomposition is an efficient, inexpensive, and convenient alternative that can leverage bulk RNA-seq to derive more fine-grained information about these cultures. We developed CellMap, a computational tool that derives cell type profiles from publicly available single-cell and single-nucleus datasets to infer cell types in bulk RNA-seq data from iPSC-derived cell lines.

Download Full-text

Revisiting the role of IRF3 in inflammation and immunity by conditional and specifically targeted gene ablation in mice

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1803936115 ◽

2018 ◽

Vol 115 (20) ◽

pp. 5253-5258 ◽

Cited By ~ 19

Author(s):

Hideyuki Yanai ◽

Shiho Chiba ◽

Sho Hangai ◽

Kohei Kometani ◽

Asuka Inoue ◽

...

Keyword(s):

Immune Cell ◽

Cell Types ◽

Type I ◽

Type I Ifn ◽

Cell Type ◽

Gene Ablation ◽

Cell Type Specific

IFN regulatory factor 3 (IRF3) is a transcription regulator of cellular responses in many cell types that is known to be essential for innate immunity. To confirm IRF3’s broad role in immunity and to more fully discern its role in various cellular subsets, we engineered Irf3-floxed mice to allow for the cell type-specific ablation of Irf3. Analysis of these mice confirmed the general requirement of IRF3 for the evocation of type I IFN responses in vitro and in vivo. Furthermore, immune cell ontogeny and frequencies of immune cell types were unaffected when Irf3 was selectively inactivated in either T cells or B cells in the mice. Interestingly, in a model of lipopolysaccharide-induced septic shock, selective Irf3 deficiency in myeloid cells led to reduced levels of type I IFN in the sera and increased survival of these mice, indicating the myeloid-specific, pathogenic role of the Toll-like receptor 4–IRF3 type I IFN axis in this model of sepsis. Thus, Irf3-floxed mice can serve as useful tool for further exploring the cell type-specific functions of this transcription factor.

Download Full-text