A global overview of single-cell type selectivity and pleiotropy in complex diseases and traits

AbstractAfter centuries of genetic studies, one of the most fundamental questions, i.e. in what cell types do DNA mutations regulate a phenotype, remains unanswered for most complex phenotypes. The current availability of hundreds of genome-wide association studies (GWASs) and single-cell RNA sequencing (scRNA-seq) of millions of cells provides a unique opportunity to address the question. In the present study, we firstly constructed an association landscape between over 20,000 single cell clusters and 997 complex phenotypes by a cross annotation framework with scRNA-seq expression profiles and GWAS summary statistics. We then performed an extensive overview of cell-type specificity and pleiotropy in human phenotypes and found most phenotypes (>90%) were moderately selectively associated with a limited number of cell types while a small fraction cell types (<10%) had strong pleiotropy in multiple phenotypes (~100). Moreover, we identified three cell type-phenotype mutual pleiotropy blocks in the landscape. The application of the single cell type-phenotype cross annotation framework (named SPA) also explained the T cell biased lymphopenia and suggested important supporting genes in severe COVID-19 from human genetics angle. All the cell type-phenotype association results can be queried and visualized at http://pmglab.top/spa.

Download Full-text

Inferring relevant tissues and cell types for complex traits in genome-wide association studies

10.1101/2021.06.09.447805 ◽

2021 ◽

Author(s):

Rujin Wang ◽

Danyu Lin ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Complex Traits ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cell Type ◽

Disease Etiology ◽

Genome Wide ◽

Cell Type Specific

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.

Download Full-text

Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus

10.1101/2020.04.12.038000 ◽

2020 ◽

Author(s):

Feng Tian ◽

Fan Zhou ◽

Xiang Li ◽

Wenping Ma ◽

Honggui Wu ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Human Cell ◽

Expression Profiles ◽

Single Cells ◽

Cell Types ◽

List Type ◽

Cell Type ◽

Genomic Architecture ◽

Gene Modules

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM

Download Full-text

Localization of migraine susceptibility genes in human brain by single-cell RNA sequencing

Cephalalgia ◽

10.1177/0333102418762476 ◽

2018 ◽

Vol 38 (13) ◽

pp. 1976-1983 ◽

Cited By ~ 5

Author(s):

William Renthal

Keyword(s):

Human Brain ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Cell Types ◽

Susceptibility Genes ◽

Brain Cell ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Brain Cell Types

Background Migraine is a debilitating disorder characterized by severe headaches and associated neurological symptoms. A key challenge to understanding migraine has been the cellular complexity of the human brain and the multiple cell types implicated in its pathophysiology. The present study leverages recent advances in single-cell transcriptomics to localize the specific human brain cell types in which putative migraine susceptibility genes are expressed. Methods The cell-type specific expression of both familial and common migraine-associated genes was determined bioinformatically using data from 2,039 individual human brain cells across two published single-cell RNA sequencing datasets. Enrichment of migraine-associated genes was determined for each brain cell type. Results Analysis of single-brain cell RNA sequencing data from five major subtypes of cells in the human cortex (neurons, oligodendrocytes, astrocytes, microglia, and endothelial cells) indicates that over 40% of known migraine-associated genes are enriched in the expression profiles of a specific brain cell type. Further analysis of neuronal migraine-associated genes demonstrated that approximately 70% were significantly enriched in inhibitory neurons and 30% in excitatory neurons. Conclusions This study takes the next step in understanding the human brain cell types in which putative migraine susceptibility genes are expressed. Both familial and common migraine may arise from dysfunction of discrete cell types within the neurovascular unit, and localization of the affected cell type(s) in an individual patient may provide insight into to their susceptibility to migraine.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Download Full-text

Single-cell analysis of colonic epithelium reveals unexpected shifts in cellular composition and molecular phenotype in treatment-naïve adult Crohn’s disease

10.1101/2021.01.13.426602 ◽

2021 ◽

Author(s):

Matt Kanke ◽

Meaghan M. Kennedy ◽

Sean Connelly ◽

Matthew Schaner ◽

Michael T. Shanahan ◽

...

Keyword(s):

Crohn’S Disease ◽

Crohn's Disease ◽

Single Cell ◽

Association Studies ◽

Paneth Cell ◽

Genome Wide Association Studies ◽

Intestinal Epithelial ◽

Cell Type ◽

Aberrant Expression ◽

Treatment Naïve

AbstractThe intestinal epithelial barrier is comprised of a monolayer of specialized intestinal epithelial cells (IECs) that are critical in maintaining gut mucosal homeostasis. Dysfunction within various IEC fractions can increase intestinal permeability, resulting in a chronic and debilitating condition known as Crohn’s disease (CD). Defining the molecular changes in each IEC type in CD will contribute to an improved understanding of the pathogenic processes and the identification of potential therapeutic targets. Here we performed, for the first time at single-cell resolution, a direct comparison of the colonic epithelial cellular and molecular landscape between treatment-naïve adult CD and non-IBD control patients. Our analysis revealed that in CD patients there is a significant skew in the colonic epithelial cellular distribution away from canonical LGR5+ stem cells, located at the crypt-bottom, and toward one specific subtype of mature colonocytes, located at the crypt-top. Further analysis revealed unique changes to gene expression programs in every major cell type, including a previously undescribed suppression in CD of most enteroendocrine driver genes as well as L-cell markers including GCG. We also dissect a previously poorly understood SPIB+ cell cluster, revealing at least four sub-clusters that exhibit unique features. One of these SPIB+ sub-clusters expresses crypt-top colonocyte markers and is significantly up-regulated in CD, whereas another sub-cluster strongly expresses and stains positive for lysozyme (albeit no other canonical Paneth cell marker), which surprisingly is greatly reduced in expression in CD. Finally, through integration with data from genome-wide association studies, we show that genes implicated in CD risk exhibit heretofore unknown cell-type specific patterns of aberrant expression in CD, providing unprecedented insight into the potential biological functions of these genes.

Download Full-text

Single-cell mapper (scMappR): using scRNA-seq to infer the cell-type specificities of differentially expressed genes

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab011 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Dustin J Sokolowski ◽

Mariela Faykoo-Martinez ◽

Lauren Erdman ◽

Huayun Hou ◽

Cadia Chan ◽

...

Keyword(s):

Single Cell ◽

Differentially Expressed Genes ◽

Cell Types ◽

Differentially Expressed ◽

Rna Seq ◽

Kidney Regeneration ◽

Cell Type ◽

Cell Type Specificity ◽

Cost Constraints ◽

Mouse Tissues

Abstract RNA sequencing (RNA-seq) is widely used to identify differentially expressed genes (DEGs) and reveal biological mechanisms underlying complex biological processes. RNA-seq is often performed on heterogeneous samples and the resulting DEGs do not necessarily indicate the cell-types where the differential expression occurred. While single-cell RNA-seq (scRNA-seq) methods solve this problem, technical and cost constraints currently limit its widespread use. Here we present single cell Mapper (scMappR), a method that assigns cell-type specificity scores to DEGs obtained from bulk RNA-seq by leveraging cell-type expression data generated by scRNA-seq and existing deconvolution methods. After evaluating scMappR with simulated RNA-seq data and benchmarking scMappR using RNA-seq data obtained from sorted blood cells, we asked if scMappR could reveal known cell-type specific changes that occur during kidney regeneration. scMappR appropriately assigned DEGs to cell-types involved in kidney regeneration, including a relatively small population of immune cells. While scMappR can work with user-supplied scRNA-seq data, we curated scRNA-seq expression matrices for ∼100 human and mouse tissues to facilitate its stand-alone use with bulk RNA-seq data from these species. Overall, scMappR is a user-friendly R package that complements traditional differential gene expression analysis of bulk RNA-seq data.

Download Full-text

Estimating and Correcting for Off-Target Cellular Contamination in Brain Cell Type Specific RNA-Seq Data

Frontiers in Molecular Neuroscience ◽

10.3389/fnmol.2021.637143 ◽

2021 ◽

Vol 14 ◽

Author(s):

Jordan Sicherman ◽

Dwight F. Newton ◽

Paul Pavlidis ◽

Etienne Sibille ◽

Shreejoy J. Tripathy

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Target Cell ◽

Differential Expression Analysis ◽

Cell Types ◽

Brain Cell ◽

Cell Type ◽

Cell Purification ◽

Single Cell Type

Transcriptionally profiling minor cellular populations remains an ongoing challenge in molecular genomics. Single-cell RNA sequencing has provided valuable insights into a number of hypotheses, but practical and analytical challenges have limited its widespread adoption. A similar approach, which we term single-cell type RNA sequencing (sctRNA-seq), involves the enrichment and sequencing of a pool of cells, yielding cell type-level resolution transcriptomes. While this approach offers benefits in terms of mRNA sampling from targeted cell types, it is potentially affected by off-target contamination from surrounding cell types. Here, we leveraged single-cell sequencing datasets to apply a computational approach for estimating and controlling the amount of off-target cell type contamination in sctRNA-seq datasets. In datasets obtained using a number of technologies for cell purification, we found that most sctRNA-seq datasets tended to show some amount of off-target mRNA contamination from surrounding cells. However, using covariates for cellular contamination in downstream differential expression analyses increased the quality of our models for differential expression analysis in case/control comparisons and typically resulted in the discovery of more differentially expressed genes. In general, our method provides a flexible approach for detecting and controlling off-target cell type contamination in sctRNA-seq datasets.

Download Full-text

Leveraging single-cell ATAC-seq to identify disease-critical fetal and adult brain cell types

10.1101/2021.05.20.445067 ◽

2021 ◽

Author(s):

Samuel S Kim ◽

Karthik Jagadeesh ◽

Kushal K Dey ◽

Amber Z Shen ◽

Soumya Raychaudhuri ◽

...

Keyword(s):

Single Cell ◽

Ganglion Cells ◽

Association Studies ◽

Photoreceptor Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Brain Cell ◽

Adult Brain ◽

Genome Wide Association Studies ◽

Brain Cell Types

Prioritizing disease-critical cell types by integrating genome-wide association studies (GWAS) with functional data is a fundamental goal. Single-cell chromatin accessibility (scATAC-seq) and gene expression (scRNA-seq) have characterized cell types at high resolution, and early work on integrating GWAS with scRNA-seq has shown promise, but work on integrating GWAS with scATAC-seq has been limited. Here, we identify disease-critical fetal and adult brain cell types by integrating GWAS summary statistics from 28 brain-related diseases and traits (average N=298K) with 3.2 million scATAC-seq and scRNA-seq profiles from 83 cell types. We identified disease-critical fetal (resp. adult) brain cell types for 22 (resp. 23) of 28 traits using scATAC-seq data, and for 8 (resp. 17) of 28 traits using scRNA-seq data. Notable findings using scATAC-seq data included highly significant enrichments of fetal photoreceptor cells for major depressive disorder, fetal ganglion cells for BMI, fetal astrocytes for ADHD, and adult VGLUT2 excitatory neurons for schizophrenia. Our findings improve our understanding of brain-related diseases and traits, and inform future analyses of other diseases/traits.

Download Full-text

Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics

10.1101/2021.03.19.436212 ◽

2021 ◽

Author(s):

Karthik A. Jagadeesh ◽

Kushal K Dey ◽

Daniel T. Montoro ◽

Steven Gazal ◽

Jesse M Engreitz ◽

...

Keyword(s):

Single Cell ◽

Disease Progression ◽

Genetic Variants ◽

Complex Traits ◽

Disease Risk ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association Studies ◽

Cascade Process ◽

Cellular Processes

Cellular dysfunction is a hallmark of disease. Genome-wide association studies (GWAS) have provided a powerful means to identify loci and genes contributing to disease risk, but in many cases the related cell types/states through which genes confer disease risk remain unknown. Deciphering such relationships is important both for our understanding of disease, and for developing therapeutic interventions. Here, we introduce a framework for integrating single-cell RNA-seq (scRNA-seq), epigenomic maps and GWAS summary statistics to infer the underlying cell types and processes by which genetic variants influence disease. We analyzed 1.6 million scRNA-seq profiles from 209 individuals spanning 11 tissue types and 6 disease conditions, and constructed gene programs capturing cell types, disease progression in cell types, and cellular processes both within and across cell types. We evaluated these gene programs for disease enrichment by transforming them to SNP annotations with tissue-specific epigenomic maps and computing enrichment scores across 60 diseases and complex traits (average N=297K). The inferred disease enrichments recapitulated known biology and highlighted novel relationships for different conditions, including GABAergic neurons in major depressive disorder (MDD), disease progression programs in M cells in ulcerative colitis, and a disease-specific complement cascade process in multiple sclerosis. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease.

Download Full-text

Sensei: How many samples to tell evolution in single-cell studies?

10.1101/2020.05.31.126565 ◽

2020 ◽

Author(s):

Shaoheng Liang ◽

Jason Willis ◽

Jinzhuang Dou ◽

Vakul Mohanty ◽

Yuefan Huang ◽

...

Keyword(s):

Single Cell ◽

Web Application ◽

Expression Profiles ◽

Cell Types ◽

Cellular Heterogeneity ◽

Controlled Study ◽

Cancer Evolution ◽

Cell Type ◽

Specific Expression ◽

Mathematical Accuracy

1AbstractCellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html.

Download Full-text