scholarly journals Partitioning gene-mediated disease heritability without eQTLs

2021 ◽  
Author(s):  
Daniel Weiner ◽  
Steven Gazal ◽  
Elise B Robinson ◽  
Luke O'Connor

Unknown SNP-to-gene regulatory architecture complicates efforts to link noncoding GWAS associations with genes implicated by sequencing or functional studies. eQTLs are used to link SNPs to genes, but expression in bulk tissue explains a small fraction of disease heritability. A simple but successful approach has been to link SNPs with nearby genes, but the fraction of heritability mediated by these genes is unclear, and gene-proximal (vs. gene-mediated) heritability enrichments are attenuated accordingly. We propose the Abstract Mediation Model (AMM) to estimate (1) the fraction of heritability mediated by the closest or kth-closest gene to each SNP and (2) the mediated heritability enrichment of a gene set (e.g. genes with rare-variant associations). AMM jointly estimates these quantities by matching the decay in SNP enrichment with distance from genes in the gene set. Across 47 complex traits and diseases, we estimate that the closest gene to each SNP mediates 27% (SE: 6%) of heritability, and that a substantial fraction is mediated by genes outside the ten closest. Mendelian disease genes are strongly enriched for common-variant heritability; for example, just 21 dyslipidemia genes mediate 25% of LDL heritability (211x enrichment, P = 0.01). Among brain-related traits, genes involved in neurodevelopmental disorders are only about 4x enriched, but gene expression patterns are highly informative, with detectable differences in per-gene heritability even among weakly brain-expressed genes.

2021 ◽  
Author(s):  
Katherine M Siewert-Rocks ◽  
Samuel S Kim ◽  
Douglas Yao ◽  
Huwenbo Shi ◽  
Alkes L. Price

Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWAS) can be used to detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a new method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by the predicted expression of causal disease genes in the gene set. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; GCSC determines that a gene set is enriched for disease heritability if genes with high co-regulation to the gene set have higher TWAS chi-square statistics than genes with low co-regulation to the gene set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well-calibrated, and well-powered to identify gene sets that are enriched for disease heritability explained by predicted expression. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits (average N=344K), analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched gene sets, recapitulating known biology. For Alzheimer's disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify gene sets associated to disease.


2017 ◽  
Author(s):  
Rachel L. Kember ◽  
Liping Hou ◽  
Xiao Ji ◽  
Lars H. Andersen ◽  
Arpita Ghorai ◽  
...  

AbstractBipolar disorder (BD) is a mental disorder characterized by alternating periods of depression and mania. Individuals with BD have higher levels of early mortality than the general population, and a substantial proportion of this may be due to increased risk for comorbid diseases. Recent evidence suggests that pleiotropy, either in the form of a single risk-allele or the combination of multiple loci genome-wide, may underlie medical comorbidity between traits and diseases. To identify the molecular events that underlie BD and related medical comorbidities, we generated imputed whole genome sequence (WGS) data using a population specific reference panel, for an extended multigenerational Old Order Amish pedigree (400 family members) segregating BD and related disorders. First, we investigated all putative disease-causing variants at known Mendelian disease loci present in this pedigree. Second, we performed genomic profiling using polygenic risk scores to establish each individual's risk for several complex diseases. To explore the contribution of disease genes to BD we performed gene-based and variant-based association tests for BD, and found that Mendelian disease genes are enriched in the top results from both tests (OR=20.3, p=1×10−3; OR=2.2, p=1×10−2). We next identified a set of Mendelian variants that co-occur in individuals with BD more frequently than their unaffected family members, including the R3527Q mutation inAPOBassociated with hypercholesterolemia. Using polygenic risk scores, we demonstrated that BD individuals from this pedigree were enriched for the same common risk-alleles for BD as in the general population (β=0.416, p=6×10−4). Furthermore, in the extended Amish family we find evidence for a common genetic etiology between BD and clinical autoimmune thyroid disease (p=1×10−4), diabetes (p=1×10−3), and lipid traits such as triglyceride levels (p=3×10−4). We identify genomic regions that contribute to the differences between BD individuals and unaffected family members by calculating local genetic risk for independent LD blocks. Our findings provide evidence for the extensive genetic pleiotropy that can drive epidemiological findings of comorbidities between diseases and other complex traits. Identifying such patterns may enable the subtyping of complex diseases and facilitate our understanding of the genetic mechanisms underlying phenotypic heterogeneity.


2014 ◽  
Vol 29 (4) ◽  
Author(s):  
Yao Yang ◽  
Inga Peter ◽  
Stuart A. Scott

AbstractSpanning over 2000 years, the Jewish population has a long history of migration, population bottlenecks, expansions, and geographical isolation, which has resulted in a unique genetic architecture among the Jewish people. As such, many Mendelian disease genes and founder mutations for autosomal recessive diseases have been discovered in several Jewish groups, which have prompted recent genomic studies in the Jewish population on common disease susceptibility and other complex traits. Although few studies on the genetic determinants of drug response variability have been reported in the Jewish population, a number of unique pharmacogenetic variants have been discovered that are more common in Jewish populations than in other major racial groups. Notable examples identified in the Ashkenazi Jewish (AJ) population include the vitamin K epoxide reductase complex subunit 1 (


2021 ◽  
Author(s):  
Martin Zhang ◽  
Kangcheng Hou ◽  
Bogdan Pasaniuc ◽  
Alkes L. Price ◽  
Kushal Dey ◽  
...  

Abstract Gene expression at the individual cell-level resolution, as quantified by single-cell RNA-sequencing (scRNA-seq), can provide unique insights into the pathology and cellular origin of diseases and complex traits. Here, we introduce single-cell Disease Relevance Score (scDRS), an approach that links scRNA-seq with polygenic risk of disease at individual cell resolution; scDRS identifies individual cells that show excess expression levels for genes in a disease-specific gene set constructed from GWAS data. We determined via simulations that scDRS is well-calibrated and powerful in identifying individual cells associated to disease. We applied scDRS to GWAS data from 74 diseases and complex traits (average N=341K) in conjunction with 16 scRNA-seq data sets spanning 1.3 million cells from 31 tissues and organs. At the cell type level, scDRS broadly recapitulated known links between classical cell types and disease, and also produced novel biologically plausible findings. At the individual cell level, scDRS identified subpopulations of disease-associated cells that are not captured by existing cell type labels, including subpopulations of CD4+ T cells associated with inflammatory bowel disease, partially characterized by their effector-like states; subpopulations of hippocampal CA1 pyramidal neurons associated with schizophrenia, partially characterized by their spatial location at the proximal part of the hippocampal CA1 region; and subpopulations of hepatocytes associated with triglyceride levels, partially characterized by their higher ploidy levels. At the gene level, we determined that genes whose expression across individual cells was correlated with the scDRS score (thus reflecting co-expression with GWAS disease genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.


2021 ◽  
Author(s):  
Martin Jinye Zhang ◽  
Kangcheng Hou ◽  
Kushal K Dey ◽  
Karthik A. Jagadeesh ◽  
Kathryn Weinand ◽  
...  

Gene expression at the individual cell-level resolution, as quantified by single-cell RNA-sequencing (scRNA-seq), can provide unique insights into the pathology and cellular origin of diseases and complex traits. Here, we introduce single-cell Disease Relevance Score (scDRS), an approach that links scRNA-seq with polygenic risk of disease at individual cell resolution; scDRS identifies individual cells that show excess expression levels for genes in a disease-specific gene set constructed from GWAS data. We determined via simulations that scDRS is well-calibrated and powerful in identifying individual cells associated to disease. We applied scDRS to GWAS data from 74 diseases and complex traits (average N=341K) in conjunction with 16 scRNA-seq data sets spanning 1.3 million cells from 31 tissues and organs. At the cell type level, scDRS broadly recapitulated known links between classical cell types and disease, and also produced novel biologically plausible findings. At the individual cell level, scDRS identified subpopulations of disease-associated cells that are not captured by existing cell type labels, including subpopulations of CD4+ T cells associated with inflammatory bowel disease, partially characterized by their effector-like states; subpopulations of hippocampal CA1 pyramidal neurons associated with schizophrenia, partially characterized by their spatial location at the proximal part of the hippocampal CA1 region; and subpopulations of hepatocytes associated with triglyceride levels, partially characterized by their higher ploidy levels. At the gene level, we determined that genes whose expression across individual cells was correlated with the scDRS score (thus reflecting co-expression with GWAS disease genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.


Author(s):  
Maria K Sobczyk ◽  
Tom R Gaunt ◽  
Lavinia Paternoster

AbstractGene prioritisation at GWAS loci necessities careful assembly and examination of different types of molecular evidence to arrive at a set of plausible candidates. In many human traits, common small-effect mutations may subtly dysregulate the function of the very same genes which are impacted by rare, large-effect mutations causing Mendelian disease of similar phenotype. However, information on gene-Mendelian disease associations, rare pathogenic mutations driving the disease, and the disease phenotype ontology is dispersed across many data sources and does not integrate easily with enrichment analysis.MendelVar is a new webserver facilitating transfer of knowledge from Mendelian disease research into interpretation of genetic associations from GWAS of complex traits. MendelVar allows querying of pre-defined or LD-determined genomic intervals against a comprehensive integrated database to find overlap with genes linked to Mendelian disease. Next, MendelVar looks for enrichment of any Human Phenotype Ontology, Disease Ontology and other ontology/pathway terms associated with identified Mendelian genes. In addition, MendelVar provides a list of all overlapping pathogenic and likely pathogenic variants for Mendelian disease sourced from ClinVar.Inclusion of information obtained from MendelVar in post-GWAS gene annotation pipelines can strengthen the case for causal importance of some genes. Moreover, as genes with Mendelian disease evidence may make for more successful drug targets, this may be particularly useful in drug discovery pipelines. Taking GWAS summary statistics for male-pattern baldness, intelligence and atopic dermatitis, we demonstrate the use of MendelVar in prioritizing candidate genes at these loci which are linked to relevant enriched ontology terms. MendelVar is freely available at https://mendelvar.mrcieu.ac.uk/


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tao Fan ◽  
Yu-Zhen Zhao ◽  
Jing-Fang Yang ◽  
Qin-Lai Liu ◽  
Yuan Tian ◽  
...  

AbstractEukaryotic cells can expand their coding ability by using their splicing machinery, spliceosome, to process precursor mRNA (pre-mRNA) into mature messenger RNA. The mega-macromolecular spliceosome contains multiple subcomplexes, referred to as small nuclear ribonucleoproteins (snRNPs). Among these, U1 snRNP and its central component, U1-70K, are crucial for splice site recognition during early spliceosome assembly. The human U1-70K has been linked to several types of human autoimmune and neurodegenerative diseases. However, its phylogenetic relationship has been seldom reported. To this end, we carried out a systemic analysis of 95 animal U1-70K genes and compare these proteins to their yeast and plant counterparts. Analysis of their gene and protein structures, expression patterns and splicing conservation suggest that animal U1-70Ks are conserved in their molecular function, and may play essential role in cancers and juvenile development. In particular, animal U1-70Ks display unique characteristics of single copy number and a splicing isoform with truncated C-terminal, suggesting the specific role of these U1-70Ks in animal kingdom. In summary, our results provide phylogenetic overview of U1-70K gene family in vertebrates. In silico analyses conducted in this work will act as a reference for future functional studies of this crucial U1 splicing factor in animal kingdom.


2016 ◽  
Author(s):  
Yang I Li ◽  
David A Knowles ◽  
Jack Humphrey ◽  
Alvaro N. Barbeira ◽  
Scott P. Dickinson ◽  
...  

AbstractThe excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable intron splicing events from short-read RNA-seq data and finds alternative splicing events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both for detecting differential splicing between sample groups, and for mapping splicing quantitative trait loci (sQTLs). Compared to contemporary methods, we find 1.4–2.1 times more sQTLs, many of which help us ascribe molecular effects to disease-associated variants. Strikingly, transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at 5% FDR by an average of 2.1-fold as compared to using gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available at https://github.com/davidaknowles/leafcutter.


2015 ◽  
Vol 2015 ◽  
pp. 1-17 ◽  
Author(s):  
Huiping Zhu ◽  
Yangdong Wang ◽  
Hengfu Yin ◽  
Ming Gao ◽  
Qiyan Zhang ◽  
...  

Leucine-rich repeat receptor-like kinases (LRR-RLKs) make up the largest group of RLKs in plants and play important roles in many key biological processes such as pathogen response and signal transduction. To date, most studies on LRR-RLKs have been conducted on model plants. Here, we identified 236 and 230LRR-RLKsin two industrial oil-producing trees:Vernicia fordiiandVernicia montana, respectively. Sequence alignment analyses showed that the homology of the RLK domain (23.81%) was greater than that of the LRR domain (9.51%) among theVf/VmLRR-RLKs. The conserved motif of the LRR domain inVf/VmLRR-RLKsmatched well the known plant LRR consensus sequence but differed at the third last amino acid (W or L). Phylogenetic analysis revealed thatVf/VmLRR-RLKswere grouped into 16 subclades. We characterized the expression profiles ofVf/VmLRR-RLKsin various tissue types including root, leaf, petal, and kernel. Further investigation revealed thatVf/VmLRR-RLKorthologous genes mainly showed similar expression patterns in response to tree wilt disease, except 4 pairs ofVf/VmLRR-RLKsthat showed opposite expression trends. These results represent an extensive evaluation ofLRR-RLKsin two industrial oil trees and will be useful for further functional studies on these proteins.


2020 ◽  
pp. jmedgenet-2020-106922
Author(s):  
Adam Waring ◽  
Andrew Harper ◽  
Silvia Salatino ◽  
Christopher Kramer ◽  
Stefan Neubauer ◽  
...  

BackgroundAlthough rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal.MethodsWe present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes.ResultsIn simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance.ConclusionGAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.


Sign in / Sign up

Export Citation Format

Share Document