Integrative Analysis of Genome-Wide Association Studies and Single-Cell Sequencing Studies

2021 ◽  
2021 ◽  
Author(s):  
Rujin Wang ◽  
Danyu Lin ◽  
Yuchao Jiang

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.


Author(s):  
Charles Kooperberg ◽  
James Y. Dai ◽  
Li Hsu

Genome-wide association studies and next generation sequencing studies offer us an unprecedented opportunity to study the genetic etiology of diseases and other traits. Over the last few years, many replicated associations between SNPs and traits have been published. It is of particular interest to identify how genes may interact with environmental factors and other genes. In this chapter, we show that a two-stage approach, where in the first stage SNPs are screened for their potential to be involved in interactions, and interactions are then tested only among SNPs that pass the screening can greatly enhance power for detecting gene-environment and gene-gene interaction in large genetic studies compared to the tests without screening.


2017 ◽  
Vol 2017 ◽  
pp. 1-4 ◽  
Author(s):  
Xiao Liang ◽  
Awen He ◽  
Wenyu Wang ◽  
Li Liu ◽  
Yanan Du ◽  
...  

Aim. To identify novel candidate genes and gene sets for diabetes. Methods. We performed an integrative analysis of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) data for diabetes. Summary data was driven from a large-scale GWAS of diabetes, totally involving 58,070 individuals. eQTLs dataset included 923,021 cis-eQTL for 14,329 genes and 4,732 trans-eQTL for 2,612 genes. Integrative analysis of GWAS and eQTLs data was conducted by summary data-based Mendelian randomization (SMR). To identify the gene sets associated with diabetes, the SMR single gene analysis results were further subjected to gene set enrichment analysis (GSEA). A total of 13,311 annotated gene sets were analyzed in this study. Results. SMR analysis identified 6 genes significantly associated with fasting glucose, such as C11ORF10 (p value = 6.04 × 10−8), MRPL33 (p value = 1.24 × 10−7), and FADS1 (p value = 2.39 × 10−7). Gene set analysis identified HUANG_FOXA2_TARGETS_UP (false discovery rate = 0.047) associated with fasting glucose. Conclusion. Our study provides novel clues for clarifying the genetic mechanism of diabetes. This study also illustrated the good performance of SMR approach and extended it to gene set association analysis for complex diseases.


2019 ◽  
Author(s):  
Lulu Shang ◽  
Jennifer A. Smith ◽  
Xiang Zhou

AbstractGenome-wide association studies (GWASs) have identified many SNPs associated with various common diseases. Understanding the biological functions of these identified SNP associations requires identifying disease/trait relevant tissues or cell types. Here, we develop a network method, CoCoNet, to facilitate the identification of trait-relevant tissues or cell types. Different from existing approaches, CoCoNet incorporates tissue-specific gene co-expression networks constructed from either bulk or single cell RNA sequencing (RNAseq) studies with GWAS data for trait-tissue inference. In particular, CoCoNet relies on a covariance regression network model to express gene-level effect sizes for the given GWAS trait as a function of the tissue-specific co-expression adjacency matrix. With a composite likelihood-based inference algorithm, CoCoNet is scalable to tens of thousands of genes. We validate the performance of CoCoNet through extensive simulations. We apply CoCoNet for an in-depth analysis of four neurological disorders and four autoimmune diseases, where we integrate the corresponding GWASs with bulk RNAseq data from 38 tissues and single cell RNAseq data from 10 cell types. In the real data applications, we show how CoCoNet can help identify specific glial cell types relevant for neurological disorders and identify disease-targeted colon tissues as relevant for autoimmune diseases. Our results also provide empirical evidence supporting one hypothesis of the omnigenic model: that trait-relevant gene co-expression networks underlie disease etiology.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Fiona A. Hagenbeek ◽  
◽  
René Pool ◽  
Jenny van Dongen ◽  
Harmen H. M. Draisma ◽  
...  

AbstractMetabolomics examines the small molecules involved in cellular metabolism. Approximately 50% of total phenotypic differences in metabolite levels is due to genetic variance, but heritability estimates differ across metabolite classes. We perform a review of all genome-wide association and (exome-) sequencing studies published between November 2008 and October 2018, and identify >800 class-specific metabolite loci associated with metabolite levels. In a twin-family cohort (N = 5117), these metabolite loci are leveraged to simultaneously estimate total heritability (h2total), and the proportion of heritability captured by known metabolite loci (h2Metabolite-hits) for 309 lipids and 52 organic acids. Our study reveals significant differences in h2Metabolite-hits among different classes of lipids and organic acids. Furthermore, phosphatidylcholines with a high degree of unsaturation have higher h2Metabolite-hits estimates than phosphatidylcholines with low degrees of unsaturation. This study highlights the importance of common genetic variants for metabolite levels, and elucidates the genetic architecture of metabolite classes.


Sign in / Sign up

Export Citation Format

Share Document