Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression

AbstractCharacterization of Human Endogenous Retrovirus (HERV) expression within the transcriptomic landscape using RNA-seq is complicated by uncertainty in fragment assignment because of sequence similarity. We present Telescope, a computational software tool that provides accurate estimation of transposable element expression (retrotranscriptome) resolved to specific genomic locations. Telescope directly addresses uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model. We demonstrate the utility of our approach through single locus analysis of HERV expression in 13 ENCODE cell types. When examined at this resolution, we find that the magnitude and breadth of the retrotranscriptome can be vastly different among cell types. Furthermore, our approach is robust to differences in sequencing technology, and demonstrates that the retrotranscriptome has potential to be used for cell type identification. Telescope performs highly accurate quantification of the retrotranscriptomic landscape in RNA-seq experiments, revealing a differential complexity in the transposable element biology of complex systems not previously observed. Telescope is available at github.com/mlbendall/telescope.Author SummaryAlmost half of the human genome is composed of Transposable elements (TEs), but their contribution to the transcriptome, their cell-type specific expression patterns, and their role in disease remains poorly understood. Recent studies have found many elements to be actively expressed and involved in key cellular processes. For example, human endogenous retroviruses (HERVs) are reported to be involved in human embryonic stem cell differentiation. Discovering which exact HERVs are differentially expressed in RNA-seq data would be a major advance in understanding such processes. However, because HERVs have a high level of sequence similarity it is hard to identify which exact HERV is differentially expressed. To solve this problem, we developed a computer program which addressed uncertainty in fragment assignment by reassigning ambiguously mapped fragments to the most probable source transcript as determined within a Bayesian statistical model. We call this program, “Telescope”. We then used Telescope to identify HERV expression in 13 well-studied cell types from the ENCODE consortium and found that different cell types could be characterized by enrichment for different HERV families, and for locus specific expression. We also showed that Telescope performed better than other methods currently used to determine TE expression. The use of this computational tool to examine new and existing RNA-seq data sets may lead to new understanding of the roles of TEs in health and disease.

Download Full-text

Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference

10.1101/354944 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xuran Wang ◽

Jihwan Park ◽

Katalin Susztak ◽

Nancy R. Zhang ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Cell Types ◽

Cellular Heterogeneity ◽

Tissue Cell ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Cell Expression

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.

Download Full-text

Moana: A robust and scalable cell type classification framework for single-cell RNA-Seq data

10.1101/456129 ◽

2018 ◽

Cited By ~ 24

Author(s):

Florian Wagner ◽

Itai Yanai

Keyword(s):

Single Cell ◽

Cell Types ◽

Specific Cell ◽

Rna Seq ◽

Cell Type ◽

Systematic Analysis ◽

Learning Framework ◽

Classification Framework ◽

Heterogeneous Tissues

AbstractSingle-cell RNA-Seq (scRNA-Seq) enables the systematic molecular characterization of heterogeneous tissues at an unprecedented resolution and scale. However, it is currently unclear how to establish formal cell type definitions, which impedes the systematic analysis of scRNA-Seq data across experiments and studies. To address this challenge, we have developed Moana, a hierarchical machine learning framework that enables the construction of robust cell type classifiers from heterogeneous scRNA-Seq datasets. To demonstrate Moana’s capabilities, we construct cell type classifiers for human immune cells that accurately distinguish between closely related cell types in the presence of experimental perturbations and systematic differences between scRNA-Seq protocols. We show that Moana is generally applicable and scales to datasets with more than ten thousand cells, thus enabling the construction of tissue-specific cell type atlases that can be directly applied to analyze new scRNASeq datasets. A Python implementation of Moana can be found at https://github.com/yanailab/moana.

Download Full-text

Cnidarian cell type diversity revealed by whole-organism single-cell RNA-seq analysis

10.1101/201103 ◽

2017 ◽

Author(s):

Arnau Sebé-Pedrós ◽

Elad Chomsky ◽

Baptiste Saudememont ◽

Marie-Pierre Mailhe ◽

Flora Pleisser ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Tissue Level ◽

Rna Seq ◽

Cell Type ◽

Specific Expression ◽

Animal Evolution ◽

Neuronal Markers ◽

Cell Type Specific Expression ◽

Cell Type Specific

A hallmark of animal evolution is the emergence and diversification of cell type-specific transcriptional states. But systematic and unbiased characterization of differentiated gene regulatory programs was so far limited to specific tissues in a few model species. Here, we perform whole-organism single cell transcriptomics to map cell types in the cnidarian Nematostella vectensis, a non-bilaterian animal that display complex tissue-level bodyplan organization. We uncover high diversity of transcriptional states in Nematostella, demonstrating cell type-specific expression for 35% of the genes and 51% of the transcription factors (TFs) detected. We identify eight broad cell clusters corresponding to cell classes such as neurons, muscles, cnidocytes, or digestive cells. These clusters comprise multiple cell modules expressing diverse and specific markers, uncovering in particular a rich repertoire of cells associated with neuronal markers. TF expression and sequence analysis defines the combinatorial code that underlies this cell-specific expression. It also reveals the existence of a complex regulatory lexicon of TF binding motifs encoded at both enhancer and promoters of Nematostella tissue-specific genes. Whole organism single cell RNA-seq is thereby established as a tool for comprehensive study of genome regulation and cell type evolution.

Download Full-text

A United Statistical Framework for Single Cell and Bulk Sequencing Data

10.1101/206532 ◽

2017 ◽

Cited By ~ 1

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Accurate Estimation ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Different Cell Types ◽

Cell Data

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.

Download Full-text

Accurate estimation of cell composition in bulk expression through robust integration of single-cell information

10.1101/669911 ◽

2019 ◽

Cited By ~ 1

Author(s):

Brandon Jew ◽

Marcus Alvarez ◽

Elior Rahmani ◽

Zong Miao ◽

Arthur Ko ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Accurate Estimation ◽

Marker Genes ◽

Rna Seq ◽

Cell Type ◽

Dorsolateral Prefrontal ◽

Additional Mode ◽

Single Nucleus

AbstractWe present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and single-nucleus RNA-seq (snRNA-seq) data, Bisque was able to replicate previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. Bisque requires a single-cell reference dataset that reflects physiological cell type composition and can further leverage datasets that includes both bulk and single cell measurements over the same samples for improved accuracy. We further propose an additional mode of operation that merely requires a set of known marker genes. Bisque is available as an R package at: https://github.com/cozygene/bisque.

Download Full-text

Connectivity characterization of the mouse basolateral amygdalar complex

Nature Communications ◽

10.1038/s41467-021-22915-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Houri Hintiryan ◽

Ian Bowman ◽

David L. Johnson ◽

Laura Korobkova ◽

Muye Zhu ◽

...

Keyword(s):

Granular Cell ◽

Cell Types ◽

Projection Neurons ◽

Cell Type ◽

Connectivity Map ◽

Analysis Techniques ◽

Domain Specific ◽

Cell Type Specific ◽

Unique Domain

AbstractThe basolateral amygdalar complex (BLA) is implicated in behaviors ranging from fear acquisition to addiction. Optogenetic methods have enabled the association of circuit-specific functions to uniquely connected BLA cell types. Thus, a systematic and detailed connectivity profile of BLA projection neurons to inform granular, cell type-specific interrogations is warranted. Here, we apply machine-learning based computational and informatics analysis techniques to the results of circuit-tracing experiments to create a foundational, comprehensive BLA connectivity map. The analyses identify three distinct domains within the anterior BLA (BLAa) that house target-specific projection neurons with distinguishable morphological features. We identify brain-wide targets of projection neurons in the three BLAa domains, as well as in the posterior BLA, ventral BLA, posterior basomedial, and lateral amygdalar nuclei. Inputs to each nucleus also are identified via retrograde tracing. The data suggests that connectionally unique, domain-specific BLAa neurons are associated with distinct behavior networks.

Download Full-text

Systematic comparison of high-throughput single-cell RNA-seq methods for immune cell profiling

BMC Genomics ◽

10.1186/s12864-020-07358-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tracy M. Yamawaki ◽

Daniel R. Lu ◽

Daniel C. Ellwanger ◽

Dev Bhatt ◽

Paolo Manzanillo ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Immune Cell ◽

Cell Types ◽

Data Interpretation ◽

Detection Sensitivity ◽

Rna Seq ◽

Cell Recovery

Abstract Background Elucidation of immune populations with single-cell RNA-seq has greatly benefited the field of immunology by deepening the characterization of immune heterogeneity and leading to the discovery of new subtypes. However, single-cell methods inherently suffer from limitations in the recovery of complete transcriptomes due to the prevalence of cellular and transcriptional dropout events. This issue is often compounded by limited sample availability and limited prior knowledge of heterogeneity, which can confound data interpretation. Results Here, we systematically benchmarked seven high-throughput single-cell RNA-seq methods. We prepared 21 libraries under identical conditions of a defined mixture of two human and two murine lymphocyte cell lines, simulating heterogeneity across immune-cell types and cell sizes. We evaluated methods by their cell recovery rate, library efficiency, sensitivity, and ability to recover expression signatures for each cell type. We observed higher mRNA detection sensitivity with the 10x Genomics 5′ v1 and 3′ v3 methods. We demonstrate that these methods have fewer dropout events, which facilitates the identification of differentially-expressed genes and improves the concordance of single-cell profiles to immune bulk RNA-seq signatures. Conclusion Overall, our characterization of immune cell mixtures provides useful metrics, which can guide selection of a high-throughput single-cell RNA-seq method for profiling more complex immune-cell heterogeneity usually found in vivo.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

Cell-type Specific Expression Quantitative Trait Loci Associated with Alzheimer Disease in Blood and Brain Tissue

10.1101/2020.11.23.20237008 ◽

2020 ◽

Author(s):

Devanshi Patel ◽

Xiaoling Zhang ◽

John J. Farrell ◽

Jaeyoon Chung ◽

Thor D. Stein ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Expression Patterns ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Eqtl Analysis ◽

Cell Type ◽

Specific Expression ◽

Cell Type Specific Expression ◽

Cell Type Specific

ABSTRACTBecause regulation of gene expression is heritable and context-dependent, we investigated AD-related gene expression patterns in cell-types in blood and brain. Cis-expression quantitative trait locus (eQTL) mapping was performed genome-wide in blood from 5,257 Framingham Heart Study (FHS) participants and in brain donated by 475 Religious Orders Study/Memory & Aging Project (ROSMAP) participants. The association of gene expression with genotypes for all cis SNPs within 1Mb of genes was evaluated using linear regression models for unrelated subjects and linear mixed models for related subjects. Cell type-specific eQTL (ct-eQTL) models included an interaction term for expression of “proxy” genes that discriminate particular cell type. Ct-eQTL analysis identified 11,649 and 2,533 additional significant gene-SNP eQTL pairs in brain and blood, respectively, that were not detected in generic eQTL analysis. Of note, 386 unique target eGenes of significant eQTLs shared between blood and brain were enriched in apoptosis and Wnt signaling pathways. Five of these shared genes are established AD loci. The potential importance and relevance to AD of significant results in myeloid cell-types is supported by the observation that a large portion of GWS ct-eQTLs map within 1Mb of established AD loci and 58% (23/40) of the most significant eGenes in these eQTLs have previously been implicated in AD. This study identified cell-type specific expression patterns for established and potentially novel AD genes, found additional evidence for the role of myeloid cells in AD risk, and discovered potential novel blood and brain AD biomarkers that highlight the importance of cell-type specific analysis.

Download Full-text