scholarly journals Assessing the contribution of rare-to-common protein-coding variants to circulating metabolic biomarker levels via 412,394 UK Biobank exome sequences

Author(s):  
Abhishek Nag ◽  
Lawrence Middleton ◽  
Ryan S Dhindsa ◽  
Dimitrios Vitsios ◽  
Eleanor M Wigmore ◽  
...  

Genome-wide association studies have established the contribution of common and low frequency variants to metabolic biomarkers in the UK Biobank (UKB); however, the role of rare variants remains to be assessed systematically. We evaluated rare coding variants for 198 metabolic biomarkers, including metabolites assayed by Nightingale Health, using exome sequencing in participants from four genetically diverse ancestries in the UKB (N=412,394). Gene-level collapsing analysis, that evaluated a range of genetic architectures, identified a total of 1,303 significant relationships between genes and metabolic biomarkers (p<1x10-8), encompassing 207 distinct genes. These include associations between rare non-synonymous variants in GIGYF1 and glucose and lipid biomarkers, SYT7 and creatinine, and others, which may provide insights into novel disease biology. Comparing to a previous microarray-based genotyping study in the same cohort, we observed that 40% of gene-biomarker relationships identified in the collapsing analysis were novel. Finally, we applied Gene-SCOUT, a novel tool that utilises the gene-biomarker association statistics from the collapsing analysis to identify genes having similar biomarker fingerprints and thus expand our understanding of gene networks.

2020 ◽  
Author(s):  
Tom Chambers ◽  
Valentina Escott-Price ◽  
Sophie Legge ◽  
Emily Baker ◽  
Krish D. Singh ◽  
...  

AbstractThere is expanding interest in researching the cerebellum given accumulating evidence of its important contributions to cognitive and emotional functions, in addition to more established sensorimotor roles. While large genome-wide association studies (GWAS) have shed light on the common allele architecture of cortical and subcortical brain structures, the cerebellum remains under investigated. We conducted a meta-GWAS of cerebellar volume in 33,265 UK-Biobank European participants. Results show cerebellar volume to be moderately heritable (h2SNP=50.6%). We identified 33 independent genome-wide associated SNPs with total cerebellar volume, with 6 of these SNPs mapped to protein-coding genes and 5 more shown to alter cerebellar gene expression. We highlight 21 unique candidate genes for follow-up analysis. Cerebellar volume showed significant genetic correlation with brainstem, pallidum and thalamus volumes, but no significant correlations with neuropsychiatric phenotypes. Our results provide important new knowledge of the genetic architecture of cerebellar volume and its relationship with other brain phenotypes.


2017 ◽  
Author(s):  
Sina Rüeger ◽  
Aaron McDaid ◽  
Zoltán Kutalik

AbstractAs most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, while genotype imputation boasts a 2- to 5-fold lower root-mean-square error, summary statistics imputation better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded an increase in statistical power by 15, 10 and 3%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.Author summaryGenome-wide association studies (GWASs) quantify the effect of genetic variants and traits, such as height. Such estimates are called association summary statistics and are typically publicly shared through publication. Typically, GWASs are carried out by genotyping ~ 500′000 SNVs for each individual which are then combined with sequenced reference panels to infer untyped SNVs in each’ individuals genome. This process of genotype imputation is resource intensive and can therefore be a limitation when combining many GWASs. An alternative approach is to bypass the use of individual data and directly impute summary statistics. In our work we compare the performance of summary statistics imputation to genotype imputation. Although we observe a 2- to 5-fold lower RMSE for genotype imputation compared to summary statistics imputation, summary statistics imputation better distinguishes true associations from null results. Furthermore, we demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank. Our study demonstrates that given current reference panels, summary statistics imputation is a very efficient and cost-effective way to identify common or low-frequency trait-associated loci.


2019 ◽  
Author(s):  
Mart Kals ◽  
Tiit Nikopensius ◽  
Kristi Läll ◽  
Kalle Pärn ◽  
Timo Tõnis Sikka ◽  
...  

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.


2021 ◽  
Author(s):  
Ruoyu Tian ◽  
Tian Ge ◽  
Jimmy Z. Liu ◽  
Max Lam ◽  
Daniel F. Levey ◽  
...  

Nearly two hundred common-variant depression risk loci have been identified by genome-wide association studies (GWAS). However, the impact of rare coding variants on depression remains poorly understood. Here, we present the largest to date exome analysis of depression based on 320,356 UK Biobank participants. We show that the burden of rare disruptive coding variants in loss-of-function intolerant genes is significantly associated with depression risk. Among 30 genes with false discovery rate (FDR) <0.1, SLC2A1, a blood-brain barrier glucose transporter underlying GLUT1 deficiency syndrome, reached exome-wide significance (P = 2.96e-7). Gene-set enrichment supports neuron projection development and muscle activities as implicated in depression. Integrating exomes with polygenic risk revealed additive contributions from common and rare variants to depression risk. The burden of rare disruptive coding variants for depression overlapped with that of developmental disorder, autism and schizophrenia. Our study provides novel insight into the contribution of rare coding variants on depression and genetic relationships across developmental and psychiatric disorders.


Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


TH Open ◽  
2020 ◽  
Vol 04 (04) ◽  
pp. e322-e331
Author(s):  
Eric Manderstedt ◽  
Christina Lind-Halldén ◽  
Stefan Lethagen ◽  
Christer Halldén

AbstractGenome-wide association studies (GWASs) have identified genes that affect plasma von Willebrand factor (VWF) levels. ABO showed a strong effect, whereas smaller effects were seen for VWF, STXBP5, STAB2, SCARA5, STX2, TC2N, and CLEC4M. This study screened comprehensively for both common and rare variants in these eight genes by resequencing their coding sequences in 104 Swedish von Willebrand disease (VWD) patients. The common variants previously associated with the VWF level were all accumulated in the VWD patients compared to three control populations. The strongest effect was detected for blood group O coded for by the ABO gene (71 vs. 38% of genotypes). The other seven VWF level associated alleles were enriched in the VWD population compared to control populations, but the differences were small and not significant. The sequencing detected a total of 146 variants in the eight genes. Excluding 70 variants in VWF, 76 variants remained. Of the 76 variants, 54 had allele frequencies > 0.5% and have therefore been investigated for their association with the VWF level in previous GWAS. The remaining 22 variants with frequencies < 0.5% are less likely to have been evaluated previously. PolyPhen2 classified 3 out of the 22 variants as probably or possibly damaging (two in STAB2 and one in STX2); the others were either synonymous or benign. No accumulation of low frequency (0.05–0.5%) or rare variants (<0.05%) in the VWD population compared to the gnomAD (Genome Aggregation Database) population was detected. Thus, rare variants in these genes do not contribute to the low VWF levels observed in VWD patients.


2020 ◽  
Author(s):  
Adam Lavertu ◽  
Gregory McInnes ◽  
Yosuke Tanigawa ◽  
Russ B Altman ◽  
Manuel A. Rivas

AbstractGenetics plays a key role in drug response, affecting efficacy and toxicity. Pharmacogenomics aims to understand how genetic variation influences drug response and develop clinical guidelines to aid clinicians in personalized treatment decisions informed by genetics. Although pharmacogenomics has not been broadly adopted into clinical practice, genetics influences treatment decisions regardless. Physicians adjust patient care based on observed response to medication, which may occur as a result of genetic variants harbored by the patient. Here we seek to understand the genetics of drug selection in statin therapy, a class of drugs widely used for high cholesterol treatment. Genetics are known to play an important role in statin efficacy and toxicity, leading to significant changes in patient outcome. We performed genome-wide association studies (GWAS) on statin selection among 59,198 participants in the UK Biobank and found that variants known to influence statin efficacy are significantly associated with statin selection. Specifically, we find that carriers of variants in APOE and LPA that are known to decrease efficacy of treatment are more likely to be on atorvastatin, a stronger statin. Additionally, carriers of the APOE and LPA variants are more likely to be on a higher intensity dose (a dose that reduces low-density lipoprotein cholesterol by greater than 40%) of atorvastatin than non-carriers (APOE: p(high intensity) = 0.16, OR = 1.7, P = 1.64 × 10−4, LPA: p(high intensity) = 0.17, OR = 1.4, P = 1.14 × 10−2). These findings represent the largest genetic association study of statin selection and statin dose association to date and provide evidence for the role of LPA and APOE in statin response, furthering the possibility of personalized statin therapy.


2021 ◽  
Author(s):  
Aleksejs Sazonovs ◽  
Christine R Stevens ◽  
Guhan R Venkataraman ◽  
Kai Yuan ◽  
Brandon Avila ◽  
...  

Genome-wide association studies (GWAS) have identified hundreds of loci associated with Crohns disease (CD); however, as with all complex diseases, deriving pathogenic mechanisms from these non-coding GWAS discoveries has been challenging. To complement GWAS and better define actionable biological targets, we analysed sequence data from more than 30,000 CD cases and 80,000 population controls. We observe rare coding variants in established CD susceptibility genes as well as ten genes where coding variation directly implicates the gene in disease risk for the first time.


PLoS Genetics ◽  
2020 ◽  
Vol 16 (12) ◽  
pp. e1009060
Author(s):  
Corbin Quick ◽  
Xiaoquan Wen ◽  
Gonçalo Abecasis ◽  
Michael Boehnke ◽  
Hyun Min Kang

Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.


2020 ◽  
Author(s):  
Dan Ju ◽  
Iain Mathieson

AbstractSkin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.SignificanceSome of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 years we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.


Sign in / Sign up

Export Citation Format

Share Document