The intersectional genetics landscape for humans

Andre Macedo; Alisson M Gontijo

doi:10.1093/gigascience/giaa083

The intersectional genetics landscape for humans

GigaScience ◽

10.1093/gigascience/giaa083 ◽

2020 ◽

Vol 9 (8) ◽

Author(s):

Andre Macedo ◽

Alisson M Gontijo

Keyword(s):

Single Cell ◽

Logic Gate ◽

Cell Types ◽

Regulatory Elements ◽

Primary Cell ◽

Interindividual Variation ◽

Cell Type ◽

Sequencing Data ◽

Diagnostic Potential ◽

Cap Analysis

ABSTRACT Background The human body is made up of hundreds—perhaps thousands—of cell types and states, most of which are currently inaccessible genetically. Intersectional genetic approaches can increase the number of genetically accessible cells, but the scope and safety of these approaches have not been systematically assessed. A typical intersectional method acts like an “AND" logic gate by converting the input of 2 or more active, yet unspecific, regulatory elements (REs) into a single cell type specific synthetic output. Results Here, we systematically assessed the intersectional genetics landscape of the human genome using a subset of cells from a large RE usage atlas (Functional ANnoTation Of the Mammalian genome 5 consortium, FANTOM5) obtained by cap analysis of gene expression sequencing (CAGE-seq). We developed the heuristics and algorithms to retrieve and quality-rank “AND" gate intersections. Of the 154 primary cell types surveyed, >90% can be distinguished from each other with as few as 3 to 4 active REs, with quantifiable safety and robustness. We call these minimal intersections of active REs with cell-type diagnostic potential “versatile entry codes" (VEnCodes). Each of the 158 cancer cell types surveyed could also be distinguished from the healthy primary cell types with small VEnCodes, most of which were robust to intra- and interindividual variation. Methods for the cross-validation of CAGE-seq–derived VEnCodes and for the extraction of VEnCodes from pooled single-cell sequencing data are also presented. Conclusions Our work provides a systematic view of the intersectional genetics landscape in humans and demonstrates the potential of these approaches for future gene delivery technologies.

Download Full-text

The intersectional genetics landscape for human

10.1101/552984 ◽

2019 ◽

Author(s):

Andre Macedo ◽

Alisson M. Gontijo

Keyword(s):

Gene Expression ◽

Therapeutic Potential ◽

Regulatory Element ◽

Logic Gate ◽

Cell Types ◽

Boolean Logic ◽

Interindividual Variation ◽

Model Organisms ◽

Cell Type ◽

Diagnostic Potential

The human body is made up of hundreds, perhaps thousands of cell types and states, most of which are currently inaccessible genetically. Genetic accessibility carries significant diagnostic and therapeutic potential by allowing the selective delivery of genetic messages or cures to cells. Research in model organisms has shown that single regulatory element (RE) activities are seldom cell type specific, limiting their usage in genetic systems designed to restrict gene expression posteriorly to their delivery to cells. Intersectional genetic approaches can increase the number of genetically accessible cells. A typical intersectional method acts like an AND logic gate by converting the input of two or more active REs into a single synthetic output, which becomes unique for that cell. Here, we systematically assessed the intersectional genetics landscape of human using a curated subset of cells from a large RE usage atlas obtained by Cap Analysis of Gene Expression Sequencing (CAGE-Seq) of thousands of primary and cancer cells (the FANTOM5 consortium atlas). We developed the heuristics and algorithms to retrieve and quality rank AND gate intersections intra- and inter-individually. We find that >90% of the 154 primary cell types surveyed can be distinguished from each other with as little as 3 to 4 active REs, with quantifiable safety and robustness. We call these minimal intersections of active REs with cell-type diagnostic potential “Versatile Entry Codes” (VEnCodes). We show that VEnCodes could be found for 100% of the 158 cancer cell types surveyed, and that most of these are highly robust to intra- and interindividual variation. Our tools for generating and quality-ranking VEnCodes can be adapted to other RE usage databases and to other intersectional methods using alternative Boolean logic operations. Our work demonstrate the potential of intersectional approaches for future gene delivery technologies in human.

Download Full-text

Identifying common and novel cell types in single-cell RNA-sequencing data using FR-Match

10.1101/2021.10.17.464718 ◽

2021 ◽

Author(s):

Yun Zhang ◽

Brian Aevermann ◽

Rohan Gala ◽

Richard H. Scheuermann

Keyword(s):

Single Cell ◽

Cell Types ◽

Sample Type ◽

Cell Type ◽

Sequencing Data ◽

Excellent Performance ◽

Single Cell Rna Sequencing ◽

Accurate Performance ◽

Cross Platform ◽

Tissue Region

Reference cell type atlases powered by single cell transcriptomic profiling technologies have become available to study cellular diversity at a granular level. We present FR-Match for matching query datasets to reference atlases with robust and accurate performance for identifying novel cell types and non-optimally clustered cell types in the query data. This approach shows excellent performance for cross-platform, cross-sample type, cross-tissue region, and cross-data modality cell type matching.

Download Full-text

The single-cell epigenetic regulatory landscape in mammalian perinatal testis development

10.1101/2021.03.17.435776 ◽

2021 ◽

Author(s):

Jinyue Liao ◽

Hoi Ching Suen ◽

Shitao Rao ◽

Alfred Chun Shui Luk ◽

Ruoyu Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Germ Cells ◽

Somatic Cells ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Cell Populations ◽

Cell Type

AbstractSpermatogenesis depends on an orchestrated series of developing events in germ cells and full maturation of the somatic microenvironment. To date, the majority of efforts to study cellular heterogeneity in testis has been focused on single-cell gene expression rather than the chromatin landscape shaping gene expression. To advance our understanding of the regulatory programs underlying testicular cell types, we analyzed single-cell chromatin accessibility profiles in more than 25,000 cells from mouse developing testis. We showed that scATAC-Seq allowed us to deconvolve distinct cell populations and identify cis-regulatory elements (CREs) underlying cell type specification. We identified sets of transcription factors associated with cell type-specific accessibility, revealing novel regulators of cell fate specification and maintenance. Pseudotime reconstruction revealed detailed regulatory dynamics coordinating the sequential developmental progressions of germ cells and somatic cells. This high-resolution data also revealed putative stem cells within the Sertoli and Leydig cell populations. Further, we defined candidate target cell types and genes of several GWAS signals, including those associated with testosterone levels and coronary artery disease. Collectively, our data provide a blueprint of the ‘regulon’ of the mouse male germline and supporting somatic cells.

Download Full-text

rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database

10.1101/2021.05.28.446161 ◽

2021 ◽

Author(s):

Daniel Osorio ◽

Marieke Lydia Kuijjer ◽

James J. Cai

Keyword(s):

Single Cell ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Single Experiment ◽

Tissue Samples ◽

Molecular Phenotypes ◽

Public Datasets

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.

Download Full-text

A harmonized atlas of mouse spinal cord cell types and their spatial organization

Nature Communications ◽

10.1038/s41467-021-25125-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Daniel E. Russ ◽

Ryan B. Patterson Cross ◽

Li Li ◽

Stephanie C. Koch ◽

Kaya J. E. Matson ◽

...

Keyword(s):

Spinal Cord ◽

Single Cell ◽

Spatial Organization ◽

Neuronal Cell ◽

Cell Types ◽

Molecular Organization ◽

Cell Type ◽

Sequencing Data ◽

Mouse Spinal Cord ◽

Common Reference

AbstractSingle-cell RNA sequencing data can unveil the molecular diversity of cell types. Cell type atlases of the mouse spinal cord have been published in recent years but have not been integrated together. Here, we generate an atlas of spinal cell types based on single-cell transcriptomic data, unifying the available datasets into a common reference framework. We report a hierarchical structure of postnatal cell type relationships, with location providing the highest level of organization, then neurotransmitter status, family, and finally, dozens of refined populations. We validate a combinatorial marker code for each neuronal cell type and map their spatial distributions in the adult spinal cord. We also show complex lineage relationships among postnatal cell types. Additionally, we develop an open-source cell type classifier, SeqSeek, to facilitate the standardization of cell type identification. This work provides an integrated view of spinal cell types, their gene expression signatures, and their molecular organization.

Download Full-text

A cis-regulatory atlas in maize at single-cell resolution

10.1101/2020.09.27.315499 ◽

2020 ◽

Author(s):

Alexandre P. Marand ◽

Zongliang Chen ◽

Andrea Gallavotti ◽

Robert J. Schmitz

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Cell Type ◽

Crop Species ◽

Cell Functions ◽

Cell Type Specific ◽

Type Specification ◽

Accessible Chromatin

ABSTRACTCis-regulatory elements (CREs) encode the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions. To identify CREs underlying cell-type specification and developmental transitions, we implemented single-cell sequencing of Assay for Transposase Accessible Chromatin in an atlas of Zea mays organs. We describe 92 distinct states of chromatin accessibility across more than 165,913 putative CREs, 56,575 cells, and 52 known cell-types in maize using a novel implementation of regularized quasibinomial logistic regression. Cell states were largely determined by combinatorial accessibility of transcription factors (TFs) and their binding sites. A neural network revealed that cell identity could be accurately predicted (>0.94) solely based on TF binding site accessibility. Co-accessible chromatin recapitulated higher-order chromatin interactions, with distinct sets of TFs coordinating cell type-specific regulatory dynamics. Pseudotime reconstruction and alignment with Arabidopsis thaliana trajectories identified conserved TFs, associated motifs, and cis-regulatory regions specifying sequential developmental progressions. Cell-type specific accessible chromatin regions were enriched with phenotype-associated genetic variants and signatures of selection, revealing the major cell-types and putative CREs targeted by modern maize breeding. Collectively, our analysis affords a comprehensive framework for understanding cellular heterogeneity, evolution, and cis-regulatory grammar of cell-type specification in a major crop species.

Download Full-text

Single cell resolution regulatory landscape of the mouse kidney highlights cellular differentiation programs and renal disease targets

10.1101/2020.05.24.113910 ◽

2020 ◽

Cited By ~ 1

Author(s):

Zhen Miao ◽

Michael S. Balzer ◽

Ziyuan Ma ◽

Hongbo Liu ◽

Junnan Wu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Kidney Development ◽

Developmental Stages ◽

Major Gene ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Cell Type ◽

Single Nucleotide Variants

AbstractDetermining the epigenetic program that generates unique cell types in the kidney is critical for understanding cell-type heterogeneity during tissue homeostasis and injury response.Here, we profiled open chromatin and gene expression in developing and adult mouse kidneys at single cell resolution. We show critical reliance of gene expression on distal regulatory elements (enhancers). We define key cell type-specific transcription factors and major gene-regulatory circuits for kidney cells. Dynamic chromatin and expression changes during nephron progenitor differentiation demonstrated that podocyte commitment occurs early and is associated with sustained Foxl1 expression. Renal tubule cells followed a more complex differentiation, where Hfn4a was associated with proximal and Tfap2b with distal fate. Mapping single nucleotide variants associated with human kidney disease identified critical cell types, developmental stages, genes, and regulatory mechanisms.We provide a global single cell resolution view of chromatin accessibility of kidney development. The dataset is available via interactive public websites.

Download Full-text

Unsupervised cell functional annotation for single-cell RNA-Seq

10.1101/2021.11.20.469410 ◽

2021 ◽

Author(s):

Dongshunyi Li ◽

Jun Ding ◽

Ziv Bar-Joseph

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Sequencing Data ◽

Gene Sets ◽

Supervised Methods ◽

Low Dimensional

One of the first steps in the analysis of single cell RNA-Sequencing data (scRNA-Seq) is the assignment of cell types. While a number of supervised methods have been developed for this, in most cases such assignment is performed by first clustering cells in low-dimensional space and then assigning cell types to different clusters. To overcome noise and to improve cell type assignments we developed UNIFAN, a neural network method that simultaneously clusters and annotates cells using known gene sets. UNIFAN combines both, low dimension representation for all genes and cell specific gene set activity scores to determine the clustering. We applied UNIFAN to human and mouse scRNA-Seq datasets from several different organs. As we show, by using knowledge on gene sets, UNIFAN greatly outperforms prior methods developed for clustering scRNA-Seq data. The gene sets assigned by UNIFAN to different clusters provide strong evidence for the cell type that is represented by this cluster making annotations easier.

Download Full-text

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04028-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bobby Ranjan ◽

Florian Schmidt ◽

Wenjie Sun ◽

Jinyu Park ◽

Mohammad Amin Honardoost ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Differentially Expressed Genes ◽

Cell Types ◽

Unsupervised Clustering ◽

Differentially Expressed ◽

Consensus Clustering ◽

Cell Type ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

Download Full-text

Identification of cell states using super-enhancer RNA

BMC Genomics ◽

10.1186/s12864-021-08092-1 ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yueh-Hua Tu ◽

Hsueh-Fen Juan ◽

Hsuan-Cheng Huang

Keyword(s):

Messenger Rna ◽

Time Course ◽

Developmental Process ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Sequencing Data ◽

Enhancer Rna ◽

Super Enhancer ◽

Different Cell Types

Abstract Background A new class of regulatory elements called super-enhancers, comprised of multiple neighboring enhancers, have recently been reported to be the key transcriptional drivers of cellular, developmental, and disease states. Results Here, we defined super-enhancer RNAs as highly expressed enhancer RNAs that are transcribed from a cluster of localized genomic regions. Using the cap analysis of gene expression sequencing data from FANTOM5, we systematically explored the enhancer and messenger RNA landscapes in hundreds of different cell types in response to various environments. Applying non-negative matrix factorization (NMF) to super-enhancer RNA profiles, we found that different cell types were well classified. In addition, through the NMF of individual time-course profiles from a single cell-type, super-enhancer RNAs were clustered into several states with progressive patterns. We further investigated the enriched biological functions of the proximal genes involved in each pattern, and found that they were associated with the corresponding developmental process. Conclusions The proposed super-enhancer RNAs can act as a good alternative, without the complicated measurement of histone modifications, for identifying important regulatory elements of cell type specification and identifying dynamic cell states.

Download Full-text