scholarly journals The MR-Base platform supports systematic causal inference across the human phenome

eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Gibran Hemani ◽  
Jie Zheng ◽  
Benjamin Elsworth ◽  
Kaitlin H Wade ◽  
Valeriia Haberland ◽  
...  

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

2018 ◽  
Vol 49 (13) ◽  
pp. 2197-2205 ◽  
Author(s):  
Hannah M. Sallis ◽  
George Davey Smith ◽  
Marcus R. Munafò

AbstractBackgroundDespite the well-documented association between smoking and personality traits such as neuroticism and extraversion, little is known about the potential causal nature of these findings. If it were possible to unpick the association between personality and smoking, it may be possible to develop tailored smoking interventions that could lead to both improved uptake and efficacy.MethodsRecent genome-wide association studies (GWAS) have identified variants robustly associated with both smoking phenotypes and personality traits. Here we use publicly available GWAS summary statistics in addition to individual-level data from UK Biobank to investigate the link between smoking and personality. We first estimate genetic overlap between traits using LD score regression and then use bidirectional Mendelian randomisation methods to unpick the nature of this relationship.ResultsWe found clear evidence of a modest genetic correlation between smoking behaviours and both neuroticism and extraversion. We found some evidence that personality traits are causally linked to certain smoking phenotypes: among current smokers each additional neuroticism risk allele was associated with smoking an additional 0.07 cigarettes per day (95% CI 0.02–0.12, p = 0.009), and each additional extraversion effect allele was associated with an elevated odds of smoking initiation (OR 1.015, 95% CI 1.01–1.02, p = 9.6 × 10−7).ConclusionWe found some evidence for specific causal pathways from personality to smoking phenotypes, and weaker evidence of an association from smoking initiation to personality. These findings could be used to inform future smoking interventions or to tailor existing schemes.


2018 ◽  
Author(s):  
Yue Wu ◽  
Eleazar Eskin ◽  
Sriram Sankararaman

AbstractImputation has been widely utilized to aid and interpret the results of Genome-Wide Association Studies(GWAS). Imputation can increase the power to identify associations when the causal variant was not directly observed or typed in the GWAS. There are two broad classes of methods for imputation. The first class imputes the genotypes at the untyped variants given the genotypes at the typed variants and then performs a statistical test of association at the imputed variants. The second class of methods, summary statistic imputation, directly imputes the association statics at the untyped variants given the association statistics observed at the typed variants. This second class of methods is appealing as it tends to be computationally efficient while only requiring the summary statistics from a study while the former class requires access to individual-level data that can be difficult to obtain. The statistical properties of these two classes of imputation methods have not been fully understood. In this paper, we show that the two classes of imputation methods are equivalent, i.e., have identical asymptotic multivariate normal distributions with zero mean and minor variations in the covariance matrix, under some reasonable assumptions. Using this equivalence, we can understand the effect of imputation methods on power. We show that a commonly employed modification of summary statistic imputation that we term summary statistic imputation with variance re-weighting generally leads to a loss in power. On the other hand, our proposed method, summary statistic imputation without performing variance re-weighting, fully accounts for imputation uncertainty while achieving better power.


2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


2020 ◽  
Author(s):  
Jiahao Zhu ◽  
Haiyan Zheng ◽  
Yasong Li ◽  
Tianle Wang ◽  
Yaohong Zhong ◽  
...  

Abstract Background: Circulating adipokines levels have been reported to be associated with the risk of rheumatoid arthritis (RA). However, it is still unclear whether these associations are causal or biased by reverse causation or residual confounding. This study aimed to assess potential causal roles of five adipokines (namely, adiponectin, leptin, resistin, chemerin, and retinol-blinding protein 4 [RBP4]) in the occurrence of RA.Methods: We conducted a two-sample Mendelian randomization analysis to investigate these associations. We used summary-level data from genome-wide association studies (GWASs) for adipokines in individuals of European ancestry as the exposure, and a separate large-scale meta-analysis of a GWAS which included 14,361 RA cases and 43,923 controls of European ancestry as the outcome. Genetic variants were selected as instrumental variables if robustly genome-wide significant in their associations with adipokines. The causal effects were estimated using the inverse-variance weighted method in the primary analysis. Sensitivity analyses were performed to warrant that bias due to genetic pleiotropy was unlikely.Results: The circulating resistin was found to be the only adipokinetic factor having statistical significance, with higher levels causally associated with the risk of RA (odds ratio: 1.28; 95% confidence interval: [1.07, 1.53] per unit increase in the natural log-transformed resistin). In contrast, associations of adiponectin, leptin, chemerin, and RBP4 with risk of RA were not statistically significant. The MR assumptions did not seem to be violated. Sensitivity analyses yielded consistent findings.Conclusions: Genetically predicted circulating resistin levels were positively associated with RA risk. Our analysis suggested that resistin may play a notable causal role in RA pathogenesis. It would be beneficial for the development of clinical as well as public health strategies that target appropriate levels of resistin for future RA intervention.


2016 ◽  
Author(s):  
Gibran Hemani ◽  
Jie Zheng ◽  
Kaitlin H Wade ◽  
Charles Laurin ◽  
Benjamin Elsworth ◽  
...  

AbstractPublished genetic associations can be used to infer causal relationships between phenotypes, bypassing the need for individual-level genotype or phenotype data. We have curated complete summary data from 1094 genome-wide association studies (GWAS) on diseases and other complex traits into a centralised database, and developed an analytical platform that uses these data to perform Mendelian randomization (MR) tests and sensitivity analyses (MR-Base, http://www.mrbase.org). Combined with curated data of published GWAS hits for phenomic measures, the MR-Base platform enables millions of potential causal relationships to be evaluated. We use the platform to predict the impact of lipid lowering on human health. While our analysis provides evidence that reducing LDL-cholesterol, lipoprotein(a) or triglyceride levels reduce coronary disease risk, it also suggests causal effects on a number of other non-vascular outcomes, indicating potential for adverse-effects or drug repositioning of lipid-lowering therapies.


2021 ◽  
pp. 1-11
Author(s):  
Valentina Escott-Price ◽  
Karl Michael Schmidt

<b><i>Background:</i></b> Genome-wide association studies (GWAS) were successful in identifying SNPs showing association with disease, but their individual effect sizes are small and require large sample sizes to achieve statistical significance. Methods of post-GWAS analysis, including gene-based, gene-set and polygenic risk scores, combine the SNP effect sizes in an attempt to boost the power of the analyses. To avoid giving undue weight to SNPs in linkage disequilibrium (LD), the LD needs to be taken into account in these analyses. <b><i>Objectives:</i></b> We review methods that attempt to adjust the effect sizes (β<i>-</i>coefficients) of summary statistics, instead of simple LD pruning. <b><i>Methods:</i></b> We subject LD adjustment approaches to a mathematical analysis, recognising Tikhonov regularisation as a framework for comparison. <b><i>Results:</i></b> Observing the similarity of the processes involved with the more straightforward Tikhonov-regularised ordinary least squares estimate for multivariate regression coefficients, we note that current methods based on a Bayesian model for the effect sizes effectively provide an implicit choice of the regularisation parameter, which is convenient, but at the price of reduced transparency and, especially in smaller LD blocks, a risk of incomplete LD correction. <b><i>Conclusions:</i></b> There is no simple answer to the question which method is best, but where interpretability of the LD adjustment is essential, as in research aiming at identifying the genomic aetiology of disorders, our study suggests that a more direct choice of mild regularisation in the correction of effect sizes may be preferable.


SLEEP ◽  
2020 ◽  
Author(s):  
Luis M García-Marín ◽  
Adrián I Campos ◽  
Nicholas G Martin ◽  
Gabriel Cuéllar-Partida ◽  
Miguel E Rentería

Abstract Study Objective Sleep is essential for both physical and mental health, and there is a growing interest in understanding how different factors shape individual variation in sleep duration, quality and patterns, or confer risk for sleep disorders. The present study aimed to identify novel inferred causal relationships between sleep-related traits and other phenotypes, using a genetics-driven hypothesis-free approach not requiring longitudinal data. Methods We used summary-level statistics from genome-wide association studies and the latent causal variable (LCV) method to screen the phenome and infer causal relationships between seven sleep-related traits (insomnia, daytime dozing, easiness of getting up in the morning, snoring, sleep duration, napping, and morningness) and 1,527 other phenotypes. Results We identify 84 inferred causal relationships. Among other findings, connective tissue disorders increase insomnia risk and reduce sleep duration; depression-related traits increase insomnia and daytime dozing; insomnia, napping and snoring are affected by obesity and cardiometabolic traits and diseases; and working with asbestos, thinner, or glues may increase insomnia risk, possibly through an increased risk of respiratory disease or socio-economic related factors. Conclusion Overall, our results indicate that changes in sleep variables are predominantly the consequence, rather than the cause, of other underlying phenotypes and diseases. These insights could inform the design of future epidemiological and interventional studies in sleep medicine and research.


2020 ◽  
Vol 117 (21) ◽  
pp. 11608-11613 ◽  
Author(s):  
Marcelo Blatt ◽  
Alexander Gusev ◽  
Yuriy Polyakov ◽  
Shafi Goldwasser

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.


Sign in / Sign up

Export Citation Format

Share Document