CellMap: Characterizing the types and composition of iPSC-derived cells from RNA-seq data

Induced pluripotent stem cell (iPSC) derived cell types are increasingly employed as in vitro model systems for drug discovery. For these studies to be meaningful, it is important to understand the reproducibility of the iPSC-derived cultures and their similarity to equivalent endogenous cell types. Single-cell and single-nucleus RNA sequencing (RNA-seq) are useful to gain such understanding, but they are expensive and time consuming, while bulk RNA-seq data can be generated quicker and at lower cost. In silico cell type decomposition is an efficient, inexpensive, and convenient alternative that can leverage bulk RNA-seq to derive more fine-grained information about these cultures. We developed CellMap, a computational tool that derives cell type profiles from publicly available single-cell and single-nucleus datasets to infer cell types in bulk RNA-seq data from iPSC-derived cell lines.

Download Full-text

Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM

10.1101/786285 ◽

2019 ◽

Cited By ~ 4

Author(s):

Marcus Alvarez ◽

Elior Rahmani ◽

Brandon Jew ◽

Kristina M. Garske ◽

Zong Miao ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Supervised Machine Learning ◽

Data Sets ◽

Rna Seq ◽

Novel Approach ◽

Single Nucleus ◽

Downstream Analysis

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.

Download Full-text

Abstract 17784: The Genesips Project: an NHLBI-Sponsored induced Pluripotent Stem Cell (iPSC) Resource for the Study of Cardiovascular Diseases

Circulation ◽

10.1161/circ.130.suppl_2.17784 ◽

2014 ◽

Vol 130 (suppl_2) ◽

Author(s):

Ivan Carcamo-Orive ◽

Paige Cundiff ◽

Hope Lancero ◽

Mohammad Shahbazi ◽

Fahim Abbasi ◽

...

Keyword(s):

Insulin Resistance ◽

Stem Cell ◽

Pluripotent Stem Cell ◽

Sendai Virus ◽

Induced Pluripotent Stem Cell ◽

Cell Types ◽

Model Systems ◽

Molecular Pathways ◽

Induced Pluripotent

The study of complex cardiovascular disease (CVD) has been hampered by the lack of appropriate human cellular model systems. In response, the NHBLI sponsored the NextGen Consortium, which encompasses 9 independent efforts spanning the portfolio of NHLBI related phenotypes. The goals of the consortium include: 1. To develop and improve methods for large-scale production and characterization of induced pluripotent stem cell (iPSC) models for CVD; 2. To create a resource of iPSC lines from a large number of phenotypically and genotypically characterized individuals. Our GENESiPS project is focused on insulin resistance (IR), a condition that affects 25-33% of the US population with serious health consequences including risk of type II diabetes and CVD. Although much is known about the physiological changes occurring during IR, little is known about the molecular pathways that drive the appearance of IR. Certain mature cell types as adipocytes, endothelial cells and skeletal muscle cells have been associated with the origin, maintenance and progression of IR. IPSCs offer an unprecedented opportunity of modeling human disease in vitro. We have created iPSC lines on insulin resistant and insulin sensitive patient groups with prior GWAS genotyping. Differentiation of these iPSCs to relevant cell types is providing the opportunity to correlate insulin sensitivity and high-density genetic variation data with specific cell-based profiling. We will validate our in vitro model and study the molecular pathways that define IR and its relationship to endothelial dysfunction. Relevant to the larger scientific community the establishment of iPSC lines on over 150 individuals (3 to 6 clones per patient) that reflect the range of insulin resistance in the general population. The iPSC lines were created from erythroblasts using the non-integrative Sendai virus system, passaged to allow clearance of Sendai virus and growth in feeder free conditions. The lines have been extensively characterized for markers of pluripotency (Tra1-60), sample identity and genomic integrity. Through the NextGen consortium, these lines, as well as phenotypic and genome-wide genotyping data will be available to qualified investigators.

Download Full-text

Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells

10.1101/2020.06.21.163766 ◽

2020 ◽

Cited By ~ 1

Author(s):

Drew Neavin ◽

Quan Nguyen ◽

Maciej S. Daniszewski ◽

Helena H. Liang ◽

Han Sheng Chiu ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Single Cell ◽

Pluripotent Stem Cells ◽

Cell Types ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Induced Pluripotent

AbstractThe discovery that somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) - cells that can be differentiated into any cell type of the three germ layers - has provided a foundation for in vitro human disease modelling1,2, drug development1,2, and population genetics studies3,4. In the majority of instances, the expression levels of genes, plays a critical role in contributing to disease risk, or the ability to identify therapeutic targets. However, while the effect of the genetic background of cell lines has been shown to strongly influence gene expression, the effect has not been evaluated at the level of individual cells. Differences in the effect of genetic variation on the gene expression of different cell-types, would provide significant resolution for in vitro research using preprogramed cells. By bringing together single cell RNA sequencing15–21 and population genetics, we now have a framework in which to evaluate the cell-types specific effects of genetic variation on gene expression. Here, we performed single cell RNA-sequencing on 64,018 fibroblasts from 79 donors and we mapped expression quantitative trait loci (eQTL) at the level of individual cell types. We demonstrate that the large majority of eQTL detected in fibroblasts are specific to an individual sub-type of cells. To address if the allelic effects on gene expression are dynamic across cell reprogramming, we generated scRNA-seq data in 19,967 iPSCs from 31 reprogramed donor lines. We again identify highly cell type specific eQTL in iPSCs, and show that that the eQTL in fibroblasts are almost entirely disappear during reprogramming. This work provides an atlas of how genetic variation influences gene expression across cell subtypes, and provided evidence for patterns of genetic architecture that lead to cell-types specific eQTL effects.

Download Full-text

Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

10.1101/669911 ◽

2019 ◽

Cited By ~ 1

Author(s):

Brandon Jew ◽

Marcus Alvarez ◽

Elior Rahmani ◽

Zong Miao ◽

Arthur Ko ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Accurate Estimation ◽

Marker Genes ◽

Rna Seq ◽

Cell Type ◽

Dorsolateral Prefrontal ◽

Additional Mode ◽

Single Nucleus

AbstractWe present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and single-nucleus RNA-seq (snRNA-seq) data, Bisque was able to replicate previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. Bisque requires a single-cell reference dataset that reflects physiological cell type composition and can further leverage datasets that includes both bulk and single cell measurements over the same samples for improved accuracy. We further propose an additional mode of operation that merely requires a set of known marker genes. Bisque is available as an R package at: https://github.com/cozygene/bisque.

Download Full-text

Establishment of a Human Induced Pluripotent Stem Cell-Derived Neuromuscular Co-Culture Under Optogenetic Control

10.1101/2020.04.10.036400 ◽

2020 ◽

Cited By ~ 1

Author(s):

Elliot W. Swartz ◽

Greg Shintani ◽

Jijun Wan ◽

Joseph S. Maffei ◽

Sarah H. Wang ◽

...

Keyword(s):

Motor Neurons ◽

Induced Pluripotent Stem Cell ◽

Cell Types ◽

Calcium Flux ◽

Electrode Arrays ◽

Spinal Motor Neurons ◽

Culture Model ◽

Skeletal Myotubes ◽

Induced Pluripotent

SummaryThe failure of the neuromuscular junction (NMJ) is a key component of degenerative neuromuscular disease, yet how NMJs degenerate in disease is unclear. Human induced pluripotent stem cells (hiPSCs) offer the ability to model disease via differentiation toward affected cell types, however, the re-creation of an in vitro neuromuscular system has proven challenging. Here we present a scalable, all-hiPSC-derived co-culture system composed of independently derived spinal motor neurons (MNs) and skeletal myotubes (sKM). In a model of C9orf72-associated disease, co-cultures form functional NMJs that can be manipulated through optical stimulation, eliciting muscle contraction and measurable calcium flux in innervated sKM. Furthermore, co-cultures grown on multi-electrode arrays (MEAs) permit the pharmacological interrogation of neuromuscular physiology. Utilization of this co-culture model as a tunable, patient-derived system may offer significant insights into NMJ formation, maturation, repair, or pathogenic mechanisms that underlie NMJ dysfunction in disease.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

Single Cell, Single Nucleus and Spatial RNA Sequencing of the Human Liver Identifies Hepatic Stellate Cell and Cholangiocyte Heterogeneity

10.1101/2021.03.27.436882 ◽

2021 ◽

Author(s):

Tallulah S Andrews ◽

Jawairia Atif ◽

Jeff C Liu ◽

Catia T Perciani ◽

Xue-Zhong Ma ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Human Liver ◽

Stellate Cell ◽

Parenchymal Cell ◽

Cell Types ◽

Cell Populations ◽

Healthy Human ◽

Single Nucleus

The critical functions of the human liver are coordinated through the interactions of hepatic parenchymal and non-parenchymal cells. Recent advances in single cell transcriptional approaches have enabled an examination of the human liver with unprecedented resolution. However, dissociation related cell perturbation can limit the ability to fully capture the human liver's parenchymal cell fraction, which limits the ability to comprehensively profile this organ. Here, we report the transcriptional landscape of 73,295 cells from the human liver using matched single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq). The addition of snRNA-seq enabled the characterization of interzonal hepatocytes at single-cell resolution, revealed the presence of rare subtypes of hepatic stellate cells previously only seen in disease, and detection of cholangiocyte progenitors that had only been observed during in vitro differentiation experiments. However, T and B lymphocytes and NK cells were only distinguishable using scRNA-seq, highlighting the importance of applying both technologies to obtain a complete map of tissue-resident cell-types. We validated the distinct spatial distribution of the hepatocyte, cholangiocyte and stellate cell populations by an independent spatial transcriptomics dataset and immunohistochemistry. Our study provides a systematic comparison of the transcriptomes captured by scRNA-seq and snRNA-seq and delivers a high-resolution map of the parenchymal cell populations in the healthy human liver.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data

BMC Bioinformatics ◽

10.1186/s12859-020-03797-8 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Chunxiang Wang ◽

Xin Gao ◽

Juntao Liu

Keyword(s):

Single Cell ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Preprocessing ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Preprocessing Method ◽

Cell Clustering ◽

Cell Gene Expression

Abstract Background Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. Results We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. Conclusion The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.

Download Full-text

SAT-298 Integrative Single-Cell Transcriptomic and Epigenomic Landscape of Mouse Anterior Pituitary Cell Types

Journal of the Endocrine Society ◽

10.1210/jendso/bvaa046.593 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

Author(s):

Frederique Murielle Ruf-Zamojski ◽

Michel A Zamojski ◽

German Nudelman ◽

Yongchao Ge ◽

Natalia Mendelev ◽

...

Keyword(s):

Single Cell ◽

Cell Line ◽

Anterior Pituitary ◽

Cell Types ◽

Chromatin Accessibility ◽

Pituitary Cell ◽

Integrated Analysis ◽

Pituitary Cells ◽

Rna Seq ◽

Cell Type

Abstract The pituitary gland is a critical regulator of the neuroendocrine system. To further our understanding of the classification, cellular heterogeneity, and regulatory landscape of pituitary cell types, we performed and computationally integrated single cell (SC)/single nucleus (SN) resolution experiments capturing RNA expression, chromatin accessibility, and DNA methylation state from mouse dissociated whole pituitaries. Both SC and SN transcriptome analysis and promoter accessibility identified the five classical hormone-producing cell types (somatotropes, gonadotropes (GT), lactotropes, thyrotropes, and corticotropes). GT cells distinctively expressed transcripts for Cga, Fshb, Lhb, Nr5a1, and Gnrhr in SC RNA-seq and SN RNA-seq. This was matched in SN ATAC-seq with GTs specifically showing open chromatin at the promoter regions for the same genes. Similarly, the other classically defined anterior pituitary cells displayed transcript expression and chromatin accessibility patterns characteristic of their own cell type. This integrated analysis identified additional cell-types, such as a stem cell cluster expressing transcripts for Sox2, Sox9, Mia, and Rbpms, and a broadly accessible chromatin state. In addition, we performed bulk ATAC-seq in the LβT2b gonadotrope-like cell line. While the FSHB promoter region was closed in the cell line, we identified a region upstream of Fshb that became accessible by the synergistic actions of GnRH and activin A, and that corresponded to a conserved region identified by a polycystic ovary syndrome (PCOS) single nucleotide polymorphism (SNP). Although this locus appears closed in deep sequencing bulk ATAC-seq of dissociated mouse pituitary cells, SN ATAC-seq of the same preparation showed that this site was specifically open in mouse GT, but closed in 14 other pituitary cell type clusters. This discrepancy highlighted the detection limit of a bulk ATAC-seq experiment in a subpopulation, as GT represented ~5% of this dissociated anterior pituitary sample. These results identified this locus as a candidate for explaining the dual dependence of Fshb expression on GnRH and activin/TGFβ signaling, and potential new evidence for upstream regulation of Fshb. The pituitary epigenetic landscape provides a resource for improved cell type identification and for the investigation of the regulatory mechanisms driving cell-to-cell heterogeneity. Additional authors not listed due to abstract submission restrictions: N. Seenarine, M. Amper, N. Jain (ISMMS).

Download Full-text