scholarly journals Bayesian analysis of genetic association across tree-structured routine healthcare data in the UK Biobank

2017 ◽  
Author(s):  
Adrian Cortes ◽  
Calliope A. Dendrou ◽  
Allan Motyer ◽  
Luke Jostins ◽  
Damjan Vukcevic ◽  
...  

Genetic discovery from the multitude of phenotypes extractable from routine healthcare data has the ability to radically transform our understanding of the human phenome, thereby accelerating progress towards precision medicine. However, a critical question when analysing high-dimensional and heterogeneous data is how to interrogate increasingly specific subphenotypes whilst retaining statistical power to detect genetic associations. Here we develop and employ a novel Bayesian analysis framework that exploits the hierarchical structure of diagnosis classifications to jointly analyse genetic variants against UK Biobank healthcare phenotypes. Our method displays a more than 20% increase in power to detect genetic effects over other approaches, such that we uncover the broader burden of genetic variation: we identify associations with over 2,000 diagnostic terms. We find novel associations with common immune-mediated diseases (IMD), we reveal the extent of genetic sharing between specific IMDs, and we expose differences in disease perception or diagnosis with potential clinical implications.

2017 ◽  
Vol 49 (9) ◽  
pp. 1311-1318 ◽  
Author(s):  
Adrian Cortes ◽  
Calliope A Dendrou ◽  
Allan Motyer ◽  
Luke Jostins ◽  
Damjan Vukcevic ◽  
...  

2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Oliver S. Burren ◽  
Guillermo Reales ◽  
Limy Wong ◽  
John Bowes ◽  
James C. Lee ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have identified pervasive sharing of genetic architectures across multiple immune-mediated diseases (IMD). By learning the genetic basis of IMD risk from common diseases, this sharing can be exploited to enable analysis of less frequent IMD where, due to limited sample size, traditional GWAS techniques are challenging. Methods Exploiting ideas from Bayesian genetic fine-mapping, we developed a disease-focused shrinkage approach to allow us to distill genetic risk components from GWAS summary statistics for a set of related diseases. We applied this technique to 13 larger GWAS of common IMD, deriving a reduced dimension “basis” that summarised the multidimensional components of genetic risk. We used independent datasets including the UK Biobank to assess the performance of the basis and characterise individual axes. Finally, we projected summary GWAS data for smaller IMD studies, with less than 1000 cases, to assess whether the approach was able to provide additional insights into genetic architecture of less common IMD or IMD subtypes, where cohort collection is challenging. Results We identified 13 IMD genetic risk components. The projection of independent UK Biobank data demonstrated the IMD specificity and accuracy of the basis even for traits with very limited case-size (e.g. vitiligo, 150 cases). Projection of additional IMD-relevant studies allowed us to add biological interpretation to specific components, e.g. related to raised eosinophil counts in blood and serum concentration of the chemokine CXCL10 (IP-10). On application to 22 rare IMD and IMD subtypes, we were able to not only highlight subtype-discriminating axes (e.g. for juvenile idiopathic arthritis) but also suggest eight novel genetic associations. Conclusions Requiring only summary-level data, our unsupervised approach allows the genetic architectures across any range of clinically related traits to be characterised in fewer dimensions. This facilitates the analysis of studies with modest sample size by matching shared axes of both genetic and biological risk across a wider disease domain, and provides an evidence base for possible therapeutic repurposing opportunities.


Author(s):  
Andrey Ziyatdinov ◽  
Jihye Kim ◽  
Dmitry Prokopenko ◽  
Florian Privé ◽  
Fabien Laporte ◽  
...  

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.


2019 ◽  
Author(s):  
Christopher DeBoever ◽  
AJ Venkatakrishnan ◽  
Joseph M Paggi ◽  
Franziska M. Heydenreich ◽  
Suli-Anne Laurin ◽  
...  

AbstractG protein-coupled receptors (GPCRs) drive an array of critical physiological functions and are an important class of drug targets, though a map of which GPCR genetic variants are associated with phenotypic variation is lacking. We performed a phenome-wide association analysis for 269 common protein-altering variants in 156 GPCRs and 275 phenotypes, including disease outcomes and diverse quantitative measurements, using 337,205 UK Biobank participants and identified 138 associations. We discovered novel associations between GPCR variants and migraine risk, hypothyroidism, and dietary consumption. We also demonstrated experimentally that variants in the β2 adrenergic receptor (ADRB2) associated with immune cell counts and pulmonary function and variants in the gastric inhibitory polypeptide receptor (GIPR) associated with food intake and body size affect downstream signaling pathways. Overall, this study provides a map of genetic associations for GPCR coding variants across a wide variety of phenotypes, which can inform future drug discovery efforts targeting GPCRs.


2017 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
Albert Tenesa

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.


2021 ◽  
Author(s):  
Elena P. Sorokin ◽  
Nicolas Basty ◽  
Brandon Whitcher ◽  
Yi Liu ◽  
Jimmy D. Bell ◽  
...  

AbstractAging, and the pathogenesis of many common diseases, involves iron homeostasis. A key role in iron homeostasis is played by the spleen, which is the largest filter of the blood and performs iron reuptake from old or damaged erythrocytes. Despite this important role, spleen iron content has not been measured previously in a large, population-based cohort. In this study, we quantify spleen iron in 41,764 participants of the UK Biobank using magnetic resonance imaging (MRI). We find that epidemiologic and environmental factors such as increased age, higher red meat consumption and lower alcohol intake correlate with higher spleen iron. Through genome-wide association study, we identify genetic associations between spleen iron and common variation at seven loci, including in two hereditary spherocytosis (HS) genes, ANK1 and SPTA1. HS-causing mutations in these genes are associated with lower reticulocyte volume and increased reticulocyte percentage, while our common alleles are associated with increased expression of ANK1 and SPTA1 in blood and with larger reticulocyte volume and reduced reticulocyte percentage. As genetic modifiers, these common alleles may explain mild spherocytosis phenotypes observed in some HS allele carriers. Further, we identify an association between spleen iron and MS4A7, which colocalizes with a quantitative trait locus for MS4A7 alternative splicing in whole blood, and with monocyte count and fraction. Through quantification of spleen iron in a large human cohort, we extend our understanding of epidemiological and genetic factors associated with iron recycling and erythrocyte morphology.


Sign in / Sign up

Export Citation Format

Share Document