scholarly journals Improving the coverage of credible sets in Bayesian genetic fine-mapping

2019 ◽  
Author(s):  
Anna Hutchinson ◽  
Hope Watson ◽  
Chris Wallace

AbstractGenome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”.Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “corrected coverage estimate”. This is extended to find “corrected credible sets”, which are the smallest set of variants such that their corrected coverage estimate meets the target coverage.We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants – a pattern matched in simulations of well powered GWAS.Crucially, our correction method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.Author summaryPinpointing specific genetic variants within the genome that are causal for human diseases is difficult due to complex correlation patterns existing between variants. Consequently, researchers typically prioritise a set of plausible causal variants for functional validation - these sets of putative causal variants are called “credible sets”. We find that the probabilistic interpretation that these credible sets do indeed contain the true causal variant is variable, in that the reported probabilities often underestimate the true coverage of the causal variant in the credible set. We have developed a method to provide researchers with a “corrected coverage estimate” that the true causal variant appears in the credible set, and this has been extended to find “corrected credible sets”, allowing for more efficient allocation of resources in the expensive follow-up laboratory experiments. We used our method to reduce the number of genetic variants to consider as causal candidates for follow-up in 27 genomic regions that are associated with type 1 diabetes.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.



2018 ◽  
Author(s):  
Satish K Nandakumar ◽  
Sean K McFarland ◽  
Laura Marlene Mateyka ◽  
Caleb A Lareau ◽  
Jacob C Ulirsch ◽  
...  

Genome-wide association studies (GWAS) have identified thousands of variants associated with human diseases and traits. However, the majority of GWAS-implicated variants are in non-coding genomic regions and require in depth follow-up to identify target genes and decipher biological mechanisms. Here, rather than focusing on causal variants, we have undertaken a pooled loss-of-function screen in primary hematopoietic cells to interrogate 389 candidate genes contained in 75 loci associated with red blood cell traits. Using this approach, we identify 77 genes at 38 GWAS loci, with most loci harboring 1-2 candidate genes. Importantly, the hit set was strongly enriched for genes validated through orthogonal genetic approaches. Genes identified by this approach are enriched in relevant biological pathways, allowing regulators of human erythropoiesis and blood disease modifiers to be defined. More generally, this functional screen provides a paradigm for gene-centric follow up of GWAS for a variety of human diseases and traits.



2020 ◽  
Vol 29 (R1) ◽  
pp. R81-R88 ◽  
Author(s):  
Anna Hutchinson ◽  
Jennifer Asimit ◽  
Chris Wallace

Abstract Whilst thousands of genetic variants have been associated with human traits, identifying the subset of those variants that are causal requires a further ‘fine-mapping’ step. We review the basic fine-mapping approach, which is computationally fast and requires only summary data, but depends on an assumption of a single causal variant per associated region which is recognized as biologically unrealistic. We discuss different ways that the approach has been built upon to accommodate multiple causal variants in a region and to incorporate additional layers of functional annotation data. We further review methods for simultaneous fine-mapping of multiple datasets, either exploiting different linkage disequilibrium (LD) structures across ancestries or borrowing information between distinct but related traits. Finally, we look to the future and the opportunities that will be offered by increasingly accurate maps of causal variants for a multitude of human traits.



Author(s):  
Karlijn A.C. Meeks ◽  
Ayo P. Doumatey ◽  
Amy R. Bentley ◽  
Mateus H. Gouveia ◽  
Guanjie Chen ◽  
...  

Background - Resistin, a protein linked with inflammation and cardiometabolic diseases, is one of few proteins for which GWAS consistently report variants within and near the coding gene ( RETN ). Here, we took advantage of the reduced linkage disequilibrium in African populations to infer genetic causality for circulating resistin levels by performing GWAS, whole-exome analysis, fine-mapping, Mendelian randomization and transcriptomic data analyses. Methods - GWAS and fine-mapping analyses for resistin were performed in 5621 African ancestry individuals, including 3754 continental Africans (AF) and 1867 African Americans (AA). Causal variants identified were subsequently used as an instrumental variable in Mendelian randomization analyses for homeostatic modelling (HOMA) derived insulin resistance index, BMI and type 2 diabetes. Results - The lead variant (rs3219175, in the promoter region of RETN ) for the single locus detected was the same for AF ( P -value 5.0×10 -111 ) and for AA (9.5×10 -38 ), respectively explaining 12.1% and 8.5% of variance in circulating resistin. Fine-mapping analyses and functional annotation revealed this variant as likely causal affecting circulating resistin levels as a cis -eQTL increasing RETN expression. Additional variants regulating resistin levels were upstream of RETN with genes PCP2 , STXBP2 and XAB2 showing the strongest association using integrative analysis of GWAS with transcriptomic data. Mendelian randomization analyses did not provide evidence for resistin increasing insulin resistance, BMI or type 2 diabetes risk in African-ancestry populations. Conclusions - Taking advantage of the fine-mapping resolution power of African genomes, we identified a single variant (rs3219175) as the likely causal variant responsible for most of the variability in circulating resistin levels. In contrast to findings in some other ancestry populations, we showed that resistin does not seem to increase insulin resistance and related cardiometabolic traits in African-ancestry populations.



2018 ◽  
Vol 50 (10) ◽  
pp. 1366-1374 ◽  
Author(s):  
Harm-Jan Westra ◽  
Marta Martínez-Bonet ◽  
Suna Onengut-Gumuscu ◽  
Annette Lee ◽  
Yang Luo ◽  
...  


2015 ◽  
Author(s):  
Mary D Fortune ◽  
Hui Guo ◽  
Oliver Burren ◽  
Ellen Schofield ◽  
Neil M Walker ◽  
...  

Identifying whether potential causal variants for related diseases are shared can increase understanding of the shared etiology between diseases. Colocalization methods are designed to disentangle shared and distinct causal variants in regions where two diseases show association, but existing methods are limited by assuming independent datasets. We extended existing methods to allow for the shared control design common in GWAS and applied them to four autoimmune diseases: type 1 diabetes (T1D); rheumatoid arthritis; celiac disease (CEL) and multiple sclerosis (MS). Ninety regions associated with at least one disease. In 22 regions (24%), we identify association to precisely one of our four diseases and can find no published association of any other disease to the same region; some of these may reflect effects mediated by the target of immune attack. Thirty-three regions (37%) were associated with two or more, but in 14 of these there was evidence that causal variants differed between diseases. By leveraging information across datasets, we identified novel disease associations to 12 regions previously associated with one or more of the other three autoimmune disorders. For instance, we link the CEL-associatedFASLGregion to T1D and identify a single SNP, rs78037977, as a likely causal variant. We also highlight several particularly complex association patterns, including theCD28-CTLA4-ICOSregion, in which it appears that three distinct causal variants associate with three diseases in three different patterns. Our results underscore the complexity in genetic variation underlying related but distinct autoimmune diseases and help to approach its dissection.



2020 ◽  
Author(s):  
Alison R Barton ◽  
Maxwell A Sherman ◽  
Ronen E. Mukamel ◽  
Po-Ru Loh

Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between 49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total N~500K) to impute exome-wide variants at high accuracy (R2>0.5) down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P<5 x 10-8) involving 675 distinct rare protein-altering variants (MAF<0.01) that passed stringent filters for likely causality; 600 of the 675 variants (89%) were not present in the NHGRI-EBI GWAS Catalog. We replicated the effect directions of 28 of 28 height-associated variants genotyped in previous exome array studies, including missense variants in newly-associated collagen genes COL16A1 and COL11A2. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified long allelic series containing up to 45 distinct likely-causal variants within the same gene (on average exhibiting 93%-concordant effect directions). In particular, 24 rare coding variants in IFRD2 independently associated with reticulocyte indices, suggesting an important role of IFRD2 in red blood cell development, and 11 rare coding variants in NPR2 (a gene previously implicated in Mendelian skeletal disorders) exhibited intermediate-to-strong effects on height (0.18-1.09 s.d.). Our results demonstrate the utility of within-cohort imputation in population-scale GWAS cohorts, provide a catalog of likely-causal, large-effect coding variant associations, and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.



2016 ◽  
Author(s):  
Andrew Anand Brown ◽  
Ana Viñuela ◽  
Olivier Delaneau ◽  
Tim Spector ◽  
Kerrin Small ◽  
...  

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show examples where the gene implicated by expression has been functionally validated as relevant for complex traits.



2015 ◽  
Vol 47 (4) ◽  
pp. 381-386 ◽  
Author(s):  
Suna Onengut-Gumuscu ◽  
◽  
Wei-Min Chen ◽  
Oliver Burren ◽  
Nick J Cooper ◽  
...  


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kira J. Stanzick ◽  
Yong Li ◽  
Pascal Schlosser ◽  
Mathias Gorski ◽  
Matthias Wuttke ◽  
...  

AbstractGenes underneath signals from genome-wide association studies (GWAS) for kidney function are promising targets for functional studies, but prioritizing variants and genes is challenging. By GWAS meta-analysis for creatinine-based estimated glomerular filtration rate (eGFR) from the Chronic Kidney Disease Genetics Consortium and UK Biobank (n = 1,201,909), we expand the number of eGFRcrea loci (424 loci, 201 novel; 9.8% eGFRcrea variance explained by 634 independent signal variants). Our increased sample size in fine-mapping (n = 1,004,040, European) more than doubles the number of signals with resolved fine-mapping (99% credible sets down to 1 variant for 44 signals, ≤5 variants for 138 signals). Cystatin-based eGFR and/or blood urea nitrogen association support 348 loci (n = 460,826 and 852,678, respectively). Our customizable tool for Gene PrioritiSation reveals 23 compelling genes including mechanistic insights and enables navigation through genes and variants likely relevant for kidney function in human to help select targets for experimental follow-up.



Sign in / Sign up

Export Citation Format

Share Document