ECLIPSER: identifying causal cell types and genes for complex traits through single cell enrichment of e/sQTL-mapped genes in GWAS loci

Summary: ECLIPSER was developed to identify pathogenic cell types and cell type-specific genes that may affect complex disease susceptibility and trait variation by integrating single cell data with known GWAS loci. ECLIPSER maps genes to GWAS loci for a given complex trait based on expression and splicing quantitative trait loci (e/sQTLs) and other functional data, and tests whether the mapped genes are enriched for cell type-specific expression in particular cell types using single-cell/nucleus RNA-seq data from one or more tissues of interest. A Bayesian Fisher's exact test is used to compute fold-enrichment significance. We demonstrate the application of ECLIPSER on various skin diseases and traits using snRNA-seq of healthy human skin samples. Availability and Implementation: The python source code and documentation for ECLIPSER and a Jupyter notebook for generating output tables and figures are available at https://github.com/segrelabgenomics/ECLIPSER. The source code for GWASvar2gene that maps genes to GWAS loci based on e/sQTLs is available at https://github.com/segrelabgenomics/GWASvar2gene. The analysis presented here used data from GTEx (https://gtexportal.org/home/datasets) and Open Targets Genetics (https://genetics-docs.opentargets.org/data-access/graphql-api), but can also be applied to other GWAS variant lists and QTL studies. Data used to reproduce the results of the paper are available in Supplementary data.

Download Full-text

Inferring relevant tissues and cell types for complex traits in genome-wide association studies

10.1101/2021.06.09.447805 ◽

2021 ◽

Author(s):

Rujin Wang ◽

Danyu Lin ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Complex Traits ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cell Type ◽

Disease Etiology ◽

Genome Wide ◽

Cell Type Specific

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.

Download Full-text

Large-scale genetic association and single cell accessible chromatin mapping defines cell type-specific mechanisms of type 1 diabetes risk

10.1101/2021.01.13.426472 ◽

2021 ◽

Author(s):

Joshua Chiou ◽

Ryan J Geusz ◽

Mei-Lin Okino ◽

Jee Yun Han ◽

Michael Miller ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Complex Disease ◽

Cell Types ◽

Causal Variant ◽

Cell Type ◽

Ductal Cells ◽

Risk Variants ◽

Cell Type Specific

ABSTRACTTranslating genome-wide association studies (GWAS) of complex disease into mechanistic insight requires a comprehensive understanding of risk variant effects on disease-relevant cell types. To uncover cell type-specific mechanisms of type 1 diabetes (T1D) risk, we combined genetic association mapping and single cell epigenomics. We performed the largest to-date GWAS of T1D in 489,679 samples imputed into 59.2M variants, which identified 74 novel association signals including several large-effect rare variants. Fine-mapping of 141 total signals substantially improved resolution of causal variant credible sets, which primarily mapped to non-coding sequence. To annotate cell type-specific regulatory mechanisms of T1D risk variants, we mapped 448,142 candidate cis-regulatory elements (cCREs) in pancreas and peripheral blood mononuclear cell types using snATAC-seq of 131,554 nuclei. T1D risk variants were enriched in cCREs active in CD4+ T cells as well as several additional cell types including pancreatic exocrine acinar and ductal cells. High-probability T1D risk variants at multiple signals mapped to exocrine-specific cCREs including novel loci near CEL, GP2 and CFTR. At the CFTR locus, the likely causal variant rs7795896 mapped in a ductal-specific distal cCRE which regulated CFTR and the risk allele reduced transcription factor binding, enhancer activity and CFTR expression in ductal cells. These findings support a role for the exocrine pancreas in T1D pathogenesis and highlight the power of combining large-scale GWAS and single cell epigenomics to provide insight into the cellular origins of complex disease.

Download Full-text

Annotations capturing cell type-specific TF binding explain a large fraction of disease heritability

Human Molecular Genetics ◽

10.1093/hmg/ddz226 ◽

2019 ◽

Vol 29 (7) ◽

pp. 1057-1067 ◽

Cited By ~ 3

Author(s):

Bryce van de Geijn ◽

Hilary Finucane ◽

Steven Gazal ◽

Farhad Hormozdiari ◽

Tiffany Amariuta ◽

...

Keyword(s):

Binding Sites ◽

Complex Traits ◽

Complex Disease ◽

Specific Binding ◽

Large Fraction ◽

Cell Types ◽

Genome Wide Association ◽

Cell Type ◽

Genome Wide ◽

Cell Type Specific

Abstract Regulatory variation plays a major role in complex disease and that cell type-specific binding of transcription factors (TF) is critical to gene regulation. However, assessing the contribution of genetic variation in TF-binding sites to disease heritability is challenging, as binding is often cell type-specific and annotations from directly measured TF binding are not currently available for most cell type-TF pairs. We investigate approaches to annotate TF binding, including directly measured chromatin data and sequence-based predictions. We find that TF-binding annotations constructed by intersecting sequence-based TF-binding predictions with cell type-specific chromatin data explain a large fraction of heritability across a broad set of diseases and corresponding cell types; this strategy of constructing annotations addresses both the limitation that identical sequences may be bound or unbound depending on surrounding chromatin context and the limitation that sequence-based predictions are generally not cell type-specific. We partitioned the heritability of 49 diseases and complex traits using stratified linkage disequilibrium (LD) score regression with the baseline-LD model (which is not cell type-specific) plus the new annotations. We determined that 100 bp windows around MotifMap sequenced-based TF-binding predictions intersected with a union of six cell type-specific chromatin marks (imputed using ChromImpute) performed best, with an 58% increase in heritability enrichment compared to the chromatin marks alone (11.6× vs. 7.3×, P = 9 × 10−14 for difference) and a 20% increase in cell type-specific signal conditional on annotations from the baseline-LD model (P = 8 × 10−11 for difference). Our results show that TF-binding annotations explain substantial disease heritability and can help refine genome-wide association signals.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

CellO: Comprehensive and hierarchical cell type classification of human cells with the Cell Ontology

10.1101/634097 ◽

2019 ◽

Cited By ~ 1

Author(s):

Matthew N. Bernstein ◽

Zhongjie Ma ◽

Michael Gleicher ◽

Colin N. Dewey

Keyword(s):

Single Cell ◽

Web Application ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Training Set ◽

Sequence Read Archive ◽

Cell Ontology ◽

Cell Type Specific ◽

Type Classification

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract

Download Full-text

ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data

10.1101/426593 ◽

2018 ◽

Cited By ~ 2

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Xiaoyu Lu ◽

Szu-wei Tu ◽

Yifan Sun ◽

...

Keyword(s):

Single Cell ◽

Cell Types ◽

Training Data ◽

Marker Genes ◽

Cell Detection ◽

Omics Data ◽

Deconvolution Method ◽

Cell Type ◽

Data Set ◽

Cell Type Specific

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.

Download Full-text

Single-cell RNA sequencing reveals cell type- and artery type-specific vascular remodelling in male spontaneously hypertensive rats

Cardiovascular Research ◽

10.1093/cvr/cvaa164 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jun Cheng ◽

Wenduo Gu ◽

Ting Lan ◽

Jiacheng Deng ◽

Zhichao Ni ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Spontaneously Hypertensive Rats ◽

Cell Types ◽

Vascular Remodelling ◽

Cell Type ◽

Hypertensive Rats ◽

Spontaneously Hypertensive ◽

Single Cell Rna Sequencing ◽

Cell Type Specific

Abstract Aims Hypertension is a major risk factor for cardiovascular diseases. However, vascular remodelling, a hallmark of hypertension, has not been systematically characterized yet. We described systematic vascular remodelling, especially the artery type- and cell type-specific changes, in hypertension using spontaneously hypertensive rats (SHRs). Methods and results Single-cell RNA sequencing was used to depict the cell atlas of mesenteric artery (MA) and aortic artery (AA) from SHRs. More than 20 000 cells were included in the analysis. The number of immune cells more than doubled in aortic aorta in SHRs compared to Wistar Kyoto controls, whereas an expansion of MA mesenchymal stromal cells (MSCs) was observed in SHRs. Comparison of corresponding artery types and cell types identified in integrated datasets unravels dysregulated genes specific for artery types and cell types. Intersection of dysregulated genes with curated gene sets including cytokines, growth factors, extracellular matrix (ECM), receptors, etc. revealed vascular remodelling events involving cell–cell interaction and ECM re-organization. Particularly, AA remodelling encompasses upregulated cytokine genes in smooth muscle cells, endothelial cells, and especially MSCs, whereas in MA, change of genes involving the contractile machinery and downregulation of ECM-related genes were more prominent. Macrophages and T cells within the aorta demonstrated significant dysregulation of cellular interaction with vascular cells. Conclusion Our findings provide the first cell landscape of resistant and conductive arteries in hypertensive animal models. Moreover, it also offers a systematic characterization of the dysregulated gene profiles with unbiased, artery type-specific and cell type-specific manners during hypertensive vascular remodelling.

Download Full-text

A single cell brain atlas in human Alzheimer’s disease

10.1101/628347 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alexandra Grubman ◽

Gabriel Chew ◽

John F. Ouyang ◽

Guizhi Sun ◽

Xin Yi Choo ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Cell Fate ◽

Expression Patterns ◽

Cell Types ◽

Gene Expression Patterns ◽

Cell Type ◽

Web Resource ◽

Cell Type Specific

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com

Download Full-text

Bulk Tissue Cell Type Deconvolution with Multi-Subject Single-Cell Expression Reference

10.1101/354944 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xuran Wang ◽

Jihwan Park ◽

Katalin Susztak ◽

Nancy R. Zhang ◽

Mingyao Li

Keyword(s):

Single Cell ◽

Cell Types ◽

Cellular Heterogeneity ◽

Tissue Cell ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific ◽

Cell Expression

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.

Download Full-text

SMNN: Batch Effect Correction for Single-cell RNA-seq data via Supervised Mutual Nearest Neighbor Detection

10.1101/672261 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yuchen Yang ◽

Gang Li ◽

Huijun Qian ◽

Kirk C. Wilhelmsen ◽

Yin Shen ◽

...

Keyword(s):

Single Cell ◽

Nearest Neighbor ◽

State Of The Art ◽

Nearest Neighbors ◽

Cell Types ◽

Batch Effect ◽

Batch Effects ◽

Cell Type ◽

Label Information ◽

Cell Type Specific

AbstractBatch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3, and LIGER. Furthermore, SMNN retains more cell type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841%.Key PointsBatch effect correction has been recognized to be critical when integrating scRNA-seq data from multiple batches due to systematic differences in time points, generating laboratory and/or handling technician(s), experimental protocol, and/or sequencing platform.Existing batch effect correction methods that leverages information from mutual nearest neighbors across batches (for example, implemented in SC3 or Seurat) ignore cell type information and suffer from potentially mismatching single cells from different cell types across batches, which would lead to undesired correction results, especially under the scenario where variation from batch effects is non-negligible compared with biological effects.To address this critical issue, here we present SMNN, a supervised machine learning method that first takes cluster/cell-type label information from users or inferred from scRNA-seq clustering, and then searches mutual nearest neighbors within each cell type instead of global searching.Our SMNN method shows clear advantages over three state-of-the-art batch effect correction methods and can better mix cells of the same cell type across batches and more effectively recover cell-type specific features, in both simulations and real datasets.

Download Full-text