scholarly journals Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption

2017 ◽  
Author(s):  
Jack Bowden ◽  
Fabiola Del Greco M ◽  
Cosetta Minelli ◽  
Qingyuan Zhao ◽  
Debbie A Lawlor ◽  
...  

AbstractBackgroundTwo-sample summary data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated.MethodsCausal estimation and heterogeneity assessment in MR requires an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘1st order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘2nd order’ weights can dramatically increase the chances of failing to detect heterogeneity, when it is truly present. We derive modified weights to mitigate both of these adverse effects.ResultsUsing Monte Carlo simulations, we show that the modified weights outperform 1st and 2nd order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using 1st and 2nd order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared to 1st order weighting. Moreover, 1st order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk.ConclusionsWe propose the use of modified weights within two-sample summary data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with 1st order weights) but further research is required to understand their strengths and weaknesses in specific settings.

2018 ◽  
Vol 48 (3) ◽  
pp. 728-742 ◽  
Author(s):  
Jack Bowden ◽  
Fabiola Del Greco M ◽  
Cosetta Minelli ◽  
Qingyuan Zhao ◽  
Debbie A Lawlor ◽  
...  

Abstract Background Two-sample summary-data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated. Methods Causal estimation and heterogeneity assessment in MR require an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘first-order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘second-order’ weights can dramatically increase the chances of failing to detect heterogeneity when it is truly present. We derive modified weights to mitigate both of these adverse effects. Results Using Monte Carlo simulations, we show that the modified weights outperform first- and second-order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using first- and second-order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared with first-order weighting. Moreover, first-order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary-data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk. Conclusions We propose the use of modified weights within two-sample summary-data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with first-order weights) but further research is required to understand their strengths and weaknesses in specific settings.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (11) ◽  
pp. e1009922
Author(s):  
Zhaotong Lin ◽  
Yangqing Deng ◽  
Wei Pan

With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.


2019 ◽  
Author(s):  
Sheng Wang ◽  
Hyunseung Kang

AbstractMendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two-sample summary-data MR being the most popular due to privacy. Unfortunately, many MR methods for two-sample summary data are not robust to weak instruments, a common phenomena with genetic instruments; many of these methods are biased and no existing MR method has Type I error control under weak instruments. In this work, we propose test statistics that are robust to weak instruments by extending Anderson-Rubin, Kleibergen, and conditional likelihood ratio tests in econometrics to the two-sample summary data setting. We conclude with a simulation and an empirical study and show that the proposed tests control size and have better power than current methods.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lijuan Lin ◽  
Ruyang Zhang ◽  
Hui Huang ◽  
Ying Zhu ◽  
Yi Li ◽  
...  

Mendelian randomization (MR) can estimate the causal effect for a risk factor on a complex disease using genetic variants as instrument variables (IVs). A variety of generalized MR methods have been proposed to integrate results arising from multiple IVs in order to increase power. One of the methods constructs the genetic score (GS) by a linear combination of the multiple IVs using the multiple regression model, which was applied in medical researches broadly. However, GS-based MR requires individual-level data, which greatly limit its application in clinical research. We propose an alternative method called Mendelian Randomization with Refined Instrumental Variable from Genetic Score (MR-RIVER) to construct a genetic IV by integrating multiple genetic variants based on summarized results, rather than individual data. Compared with inverse-variance weighted (IVW) and generalized summary-data-based Mendelian randomization (GSMR), MR-RIVER maintained the type I error, while possessing more statistical power than the competing methods. MR-RIVER also presented smaller biases and mean squared errors, compared to the IVW and GSMR. We further applied the proposed method to estimate the effects of blood metabolites on educational attainment, by integrating results from several publicly available resources. MR-RIVER provided robust results under different LD prune criteria and identified three metabolites associated with years of schooling and additional 15 metabolites with indirect mediation effects through butyrylcarnitine. MR-RIVER, which extends score-based MR to summarized results in lieu of individual data and incorporates multiple correlated IVs, provided a more accurate and powerful means for the discovery of novel risk factors.


2018 ◽  
Vol 48 (3) ◽  
pp. 713-727 ◽  
Author(s):  
Eleanor Sanderson ◽  
George Davey Smith ◽  
Frank Windmeijer ◽  
Jack Bowden

Abstract Background Mendelian randomization (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilizing genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome. Methods and results We use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single-sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK Biobank to estimate the effect of education and cognitive ability on body mass index. Conclusion MVMR analysis consistently estimates the direct causal effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual- or summary-level data.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (4) ◽  
pp. e1009525
Author(s):  
Mark Gormley ◽  
James Yarmolinsky ◽  
Tom Dudding ◽  
Kimberley Burrows ◽  
Richard M. Martin ◽  
...  

Head and neck squamous cell carcinoma (HNSCC), which includes cancers of the oral cavity and oropharynx, is a cause of substantial global morbidity and mortality. Strategies to reduce disease burden include discovery of novel therapies and repurposing of existing drugs. Statins are commonly prescribed for lowering circulating cholesterol by inhibiting HMG-CoA reductase (HMGCR). Results from some observational studies suggest that statin use may reduce HNSCC risk. We appraised the relationship of genetically-proxied cholesterol-lowering drug targets and other circulating lipid traits with oral (OC) and oropharyngeal (OPC) cancer risk using two-sample Mendelian randomization (MR). For the primary analysis, germline genetic variants in HMGCR, NPC1L1, CETP, PCSK9 and LDLR were used to proxy the effect of low-density lipoprotein cholesterol (LDL-C) lowering therapies. In secondary analyses, variants were used to proxy circulating levels of other lipid traits in a genome-wide association study (GWAS) meta-analysis of 188,578 individuals. Both primary and secondary analyses aimed to estimate the downstream causal effect of cholesterol lowering therapies on OC and OPC risk. The second sample for MR was taken from a GWAS of 6,034 OC and OPC cases and 6,585 controls (GAME-ON). Analyses were replicated in UK Biobank, using 839 OC and OPC cases and 372,016 controls and the results of the GAME-ON and UK Biobank analyses combined in a fixed-effects meta-analysis. We found limited evidence of a causal effect of genetically-proxied LDL-C lowering using HMGCR, NPC1L1, CETP or other circulating lipid traits on either OC or OPC risk. Genetically-proxied PCSK9 inhibition equivalent to a 1 mmol/L (38.7 mg/dL) reduction in LDL-C was associated with an increased risk of OC and OPC combined (OR 1.8 95%CI 1.2, 2.8, p = 9.31 x10-05), with good concordance between GAME-ON and UK Biobank (I2 = 22%). Effects for PCSK9 appeared stronger in relation to OPC (OR 2.6 95%CI 1.4, 4.9) than OC (OR 1.4 95%CI 0.8, 2.4). LDLR variants, resulting in genetically-proxied reduction in LDL-C equivalent to a 1 mmol/L (38.7 mg/dL), reduced the risk of OC and OPC combined (OR 0.7, 95%CI 0.5, 1.0, p = 0.006). A series of pleiotropy-robust and outlier detection methods showed that pleiotropy did not bias our findings. We found limited evidence for a role of cholesterol-lowering in OC and OPC risk, suggesting previous observational results may have been confounded. There was some evidence that genetically-proxied inhibition of PCSK9 increased risk, while lipid-lowering variants in LDLR, reduced risk of combined OC and OPC. This result suggests that the mechanisms of action of PCSK9 on OC and OPC risk may be independent of its cholesterol lowering effects; however, this was not supported uniformly across all sensitivity analyses and further replication of this finding is required.


2021 ◽  
Author(s):  
Jin Jin ◽  
Guanghao Qi ◽  
Zhi Yu ◽  
Nilanjan Chatterjee

AbstractMendelian Randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers, or other types of traits, that are co-regulated by the exposure. We propose method MRLE, which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of summary association statistics, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies showed that MRLE has well-controlled type I error rates and increased power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α and MCP-1) provided evidence for potential causal effects of inflammation on increased risk of coronary artery disease, colorectal cancer and rheumatoid arthritis, while standard MR analysis for individual biomarkers often failed to detect consistent evidence for such effects.


2019 ◽  
Author(s):  
Christopher N Foley ◽  
Paul D W Kirk ◽  
Stephen Burgess

AbstractMotivationMendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for uncertainty in the causal estimates, and it includes ‘null’ and ‘junk’ clusters, to provide protection against the detection of spurious clusters.ResultsOur algorithm correctly detected the number of clusters in a simulation analysis, outperforming the popular Mclust method. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A hypothesis-free search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster.Availability and ImplementationMR-Clust can be downloaded from https://github.com/cnfoley/[email protected] or [email protected] InformationSupplementary Material is included in the submission.


2018 ◽  
Author(s):  
Eleanor Sanderson ◽  
George Davey Smith ◽  
Frank Windmeijer ◽  
Jack Bowden

AbstractBackgroundMendelian Randomisation (MR) is a powerful tool in epidemiology which can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to Multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome.Methods/ResultsWe use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK biobank to estimate the effect of education and cognitive ability on body mass index.ConclusionMVMR analysis consistently estimates the effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual or summary level data.


Author(s):  
Christopher N Foley ◽  
Amy M Mason ◽  
Paul D W Kirk ◽  
Stephen Burgess

Abstract Motivation Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes ‘null’ and ‘junk’ clusters, to provide protection against the detection of spurious clusters. Results Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster. Availability and implementation MR-Clust can be downloaded from https://github.com/cnfoley/mrclust. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document