DESE: estimating driver tissues by selective expression of genes associated with complex diseases or traits

Abstract The driver tissues or cell types in which susceptibility genes initiate diseases remain elusive. We develop a unified framework to detect the causal tissues of complex diseases or traits according to selective expression of disease-associated genes in genome-wide association studies (GWASs). This framework consists of three components which run iteratively to produce a converged prioritization list of driver tissues. Additionally, this framework also outputs a list of prioritized genes as a byproduct. We apply the framework to six representative complex diseases or traits with GWAS summary statistics, which leads to the estimation of the lung as an associated tissue of rheumatoid arthritis.

Download Full-text

Partitioning heritability by functional category using GWAS summary statistics

10.1101/014241 ◽

2015 ◽

Cited By ~ 9

Author(s):

Hilary Kiyo Finucane ◽

Brendan Bulik-Sullivan ◽

Alexander Gusev ◽

Gosia Trynka ◽

Yakir Reshef ◽

...

Keyword(s):

Association Studies ◽

Smoking Behavior ◽

Complex Diseases ◽

New Method ◽

Age At Menarche ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Cell Type ◽

Genome Wide ◽

Cell Type Specific

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.

Download Full-text

A unified framework for variance component estimation with summary statistics in genome-wide association studies

The Annals of Applied Statistics ◽

10.1214/17-aoas1052 ◽

2017 ◽

Vol 11 (4) ◽

pp. 2027-2051 ◽

Cited By ~ 42

Author(s):

Xiang Zhou

Keyword(s):

Variance Component ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Variance Component Estimation ◽

Summary Statistics ◽

Unified Framework ◽

Genome Wide ◽

Component Estimation

Download Full-text

Genetic dependency of Alzheimer’s disease-associated genes across cells and tissue types

Scientific Reports ◽

10.1038/s41598-021-91713-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Suraj K. Jaladanki ◽

Abdulkadir Elmas ◽

Gabriel Santos Malave ◽

Kuan-lin Huang

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cell Lines ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association Studies ◽

Disease Etiology ◽

Genome Wide ◽

Disease Associated Genes ◽

Network Studies

AbstractEffective treatments targeting disease etiology are urgently needed for Alzheimer’s disease (AD). Although candidate AD genes have been identified and altering their levels may serve as therapeutic strategies, the consequence of such alterations remain largely unknown. Herein, we analyzed CRISPR knockout/RNAi knockdown screen data for over 700 cell lines and evaluated cellular dependencies of 104 AD-associated genes previously identified by genome-wide association studies (GWAS) and gene expression network studies. Multiple genes showed widespread cell dependencies across tissue lineages, suggesting their inhibition may yield off-target effects. Meanwhile, several genes including SPI1, MEF2C, GAB2, ABCC11, ATCG1 were identified as genes of interest since their genetic knockouts specifically affected high-expressing cells whose tissue lineages are relevant to cell types found in AD. Overall, analyses of genetic screen data identified AD-associated genes whose knockout or knockdown selectively affected cell lines of relevant tissue lineages, prioritizing targets for potential AD treatments.

Download Full-text

Genetic Dependency of Alzheimer’s Disease-Associated Genes across Cells and Tissue Types

10.21203/rs.3.rs-153131/v1 ◽

2021 ◽

Author(s):

Suraj Kevin Jaladanki ◽

Abdulkadir Elmas ◽

Gabriel Santos-Malave ◽

Kuan-lin Huang

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cell Lines ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association Studies ◽

Disease Etiology ◽

Genome Wide ◽

Disease Associated Genes ◽

Network Studies

Abstract Effective treatments targeting disease etiology are urgently needed for Alzheimer’s disease (AD). Although candidate AD genes have been identified and altering their levels may serve as therapeutic strategies, the consequence of such alterations remain largely unknown. Herein, we analyzed CRISPR knockout/RNAi knockdown screen data for over 700 cell lines and evaluated cellular dependencies of 104 AD-associated genes previously identified by genome-wide association studies (GWAS) and gene expression network studies. Multiple genes showed widespread cell dependencies across tissue lineages, suggesting their inhibition may yield off-target effects. Meanwhile, several genes including SPI1, MEF2C, GAB2, ABCC11, ATCG1 were identified as genes of interest since their genetic knockouts specifically affected high-expressing cells whose tissue lineages are relevant to cell types found in AD. Overall, analyses of genetic screen data identified AD-associated genes whose knockout or knockdown selectively affected cell lines of relevant tissue lineages, prioritizing targets for potential AD treatments.

Download Full-text

Estimating driver-tissues by robust selective expression of genes associated with complex diseases or traits

10.1101/491878 ◽

2018 ◽

Author(s):

Lin Jiang ◽

Chao Xue ◽

Shangzhen Chen ◽

Sheng Dai ◽

Peikai Chen ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Large Scale ◽

Genome Wide Association Study ◽

Complex Diseases ◽

Transcript Level ◽

Cell Types ◽

Z Score ◽

Expression Of Genes ◽

Selective Expression ◽

Disease Associated Genes

AbstractThe driver tissues or cell-types of many human diseases, in which susceptibility genes cause the diseases, remain elusive. We developed a framework to detect the causal-tissues of complex diseases or traits according to selective expression of disease-associated genes in genome-wide association study (GWAS). The core method of the framework is a new robust z-score to estimate genes’ expression selectivity. Through extensive computing simulations and comparative analyses in a large-scale schizophrenia GWAS, we demonstrate the robust z-score is more sensitive than existing methods to detect multiple selectively expressed tissues, which further lead to the estimation of more biological sensible driver tissues. The effectiveness of this framework is further validated in five representative complex diseases with the usage of GWAS summary statistics and transcript-level expression in GTEx project. Finally, we also demonstrate that the prioritized tissues and the robust selective expression can enhance characterization of directly associated genes of a disease as well. Interesting results include the estimation of lung as a driver tissue of rheumatoid arthritis, consistent with clinical observations of morbidity between rheumatoid arthritis and lung diseases.

Download Full-text

CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies

Nucleic Acids Research ◽

10.1093/nar/gkz1026 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jianhua Wang ◽

Dandan Huang ◽

Yao Zhou ◽

Hongcheng Yao ◽

Huanhuan Liu ◽

...

Keyword(s):

Fine Mapping ◽

Genetic Variants ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Credible Sets ◽

Causal Variants

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.

Download Full-text

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics

10.1101/032474 ◽

2015 ◽

Author(s):

Dominic Holland ◽

Yunpeng Wang ◽

Wesley K Thompson ◽

Andrew Schork ◽

Chi-Hua Chen ◽

...

Keyword(s):

Association Studies ◽

Significant Snps ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Sample Sizes ◽

Genetic Components ◽

Complex Phenotypes ◽

Genome Wide ◽

Z Scores

Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities that does not require raw genotype data, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype -- the proportion of SNPs (after uniform pruning, so that large LD blocks are not over-represented) likely to be in strong LD with causal/mechanistically associated SNPs -- and predicting the proportion of chip heritability explainable by genome wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and additionally, for purposes of illustration, putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We estimate the degree to which effect sizes are over-estimated when based on linear regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text