Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption

AbstractBackgroundTwo-sample summary data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated.MethodsCausal estimation and heterogeneity assessment in MR requires an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘1st order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘2nd order’ weights can dramatically increase the chances of failing to detect heterogeneity, when it is truly present. We derive modified weights to mitigate both of these adverse effects.ResultsUsing Monte Carlo simulations, we show that the modified weights outperform 1st and 2nd order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using 1st and 2nd order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared to 1st order weighting. Moreover, 1st order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk.ConclusionsWe propose the use of modified weights within two-sample summary data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with 1st order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption

International Journal of Epidemiology ◽

10.1093/ije/dyy258 ◽

2018 ◽

Vol 48 (3) ◽

pp. 728-742 ◽

Cited By ~ 42

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Second Order ◽

Type I ◽

Weak Instruments ◽

First Order ◽

Summary Data

Abstract Background Two-sample summary-data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated. Methods Causal estimation and heterogeneity assessment in MR require an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘first-order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘second-order’ weights can dramatically increase the chances of failing to detect heterogeneity when it is truly present. We derive modified weights to mitigate both of these adverse effects. Results Using Monte Carlo simulations, we show that the modified weights outperform first- and second-order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using first- and second-order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared with first-order weighting. Moreover, first-order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary-data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk. Conclusions We propose the use of modified weights within two-sample summary-data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with first-order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Combining the strengths of inverse-variance weighting and Egger regression in Mendelian randomization using a mixture of regressions model

PLoS Genetics ◽

10.1371/journal.pgen.1009922 ◽

2021 ◽

Vol 17 (11) ◽

pp. e1009922

Author(s):

Zhaotong Lin ◽

Yangqing Deng ◽

Wei Pan

Keyword(s):

Large Scale ◽

Type I Error ◽

Mendelian Randomization ◽

Meta Analysis ◽

Type I ◽

Perturbation Scheme ◽

Analysis Model ◽

Component Mixture ◽

Inverse Variance ◽

Summary Data

With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.

Download Full-text

Weak-Instrument Robust Tests in Two-Sample Summary-Data Mendelian Randomization

10.1101/769562 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sheng Wang ◽

Hyunseung Kang

Keyword(s):

Error Control ◽

Type I Error ◽

Mendelian Randomization ◽

Likelihood Ratio Tests ◽

Type I ◽

Conditional Likelihood ◽

Test Statistics ◽

Weak Instruments ◽

Robust Tests ◽

Summary Data

AbstractMendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two-sample summary-data MR being the most popular due to privacy. Unfortunately, many MR methods for two-sample summary data are not robust to weak instruments, a common phenomena with genetic instruments; many of these methods are biased and no existing MR method has Type I error control under weak instruments. In this work, we propose test statistics that are robust to weak instruments by extending Anderson-Rubin, Kleibergen, and conditional likelihood ratio tests in econometrics to the two-sample summary data setting. We conclude with a simulation and an empirical study and show that the proposed tests control size and have better power than current methods.

Download Full-text

Mendelian Randomization With Refined Instrumental Variables From Genetic Score Improves Accuracy and Reduces Bias

Frontiers in Genetics ◽

10.3389/fgene.2021.618829 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lijuan Lin ◽

Ruyang Zhang ◽

Hui Huang ◽

Ying Zhu ◽

Yi Li ◽

...

Keyword(s):

Genetic Variants ◽

Statistical Power ◽

Complex Disease ◽

Type I Error ◽

Mendelian Randomization ◽

Causal Effect ◽

Type I ◽

Individual Data ◽

Genetic Score ◽

Mediation Effects

Mendelian randomization (MR) can estimate the causal effect for a risk factor on a complex disease using genetic variants as instrument variables (IVs). A variety of generalized MR methods have been proposed to integrate results arising from multiple IVs in order to increase power. One of the methods constructs the genetic score (GS) by a linear combination of the multiple IVs using the multiple regression model, which was applied in medical researches broadly. However, GS-based MR requires individual-level data, which greatly limit its application in clinical research. We propose an alternative method called Mendelian Randomization with Refined Instrumental Variable from Genetic Score (MR-RIVER) to construct a genetic IV by integrating multiple genetic variants based on summarized results, rather than individual data. Compared with inverse-variance weighted (IVW) and generalized summary-data-based Mendelian randomization (GSMR), MR-RIVER maintained the type I error, while possessing more statistical power than the competing methods. MR-RIVER also presented smaller biases and mean squared errors, compared to the IVW and GSMR. We further applied the proposed method to estimate the effects of blood metabolites on educational attainment, by integrating results from several publicly available resources. MR-RIVER provided robust results under different LD prune criteria and identified three metabolites associated with years of schooling and additional 15 metabolites with indirect mediation effects through butyrylcarnitine. MR-RIVER, which extends score-based MR to summarized results in lieu of individual data and incorporates multiple correlated IVs, provided a more accurate and powerful means for the discovery of novel risk factors.

Download Full-text

An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings

International Journal of Epidemiology ◽

10.1093/ije/dyy262 ◽

2018 ◽

Vol 48 (3) ◽

pp. 713-727 ◽

Cited By ~ 90

Author(s):

Eleanor Sanderson ◽

George Davey Smith ◽

Frank Windmeijer ◽

Jack Bowden

Keyword(s):

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Causal Effects ◽

Single Sample ◽

Uk Biobank ◽

Level Data ◽

Wide Range ◽

Using Data ◽

Summary Data

Abstract Background Mendelian randomization (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilizing genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome. Methods and results We use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single-sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK Biobank to estimate the effect of education and cognitive ability on body mass index. Conclusion MVMR analysis consistently estimates the direct causal effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual- or summary-level data.

Download Full-text

Using genetic variants to evaluate the causal effect of cholesterol lowering on head and neck cancer risk: A Mendelian randomization study

PLoS Genetics ◽

10.1371/journal.pgen.1009525 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1009525

Author(s):

Mark Gormley ◽

James Yarmolinsky ◽

Tom Dudding ◽

Kimberley Burrows ◽

Richard M. Martin ◽

...

Keyword(s):

Cancer Risk ◽

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Meta Analysis ◽

Uk Biobank ◽

Limited Evidence ◽

Cholesterol Lowering ◽

Increased Risk ◽

Secondary Analyses

Head and neck squamous cell carcinoma (HNSCC), which includes cancers of the oral cavity and oropharynx, is a cause of substantial global morbidity and mortality. Strategies to reduce disease burden include discovery of novel therapies and repurposing of existing drugs. Statins are commonly prescribed for lowering circulating cholesterol by inhibiting HMG-CoA reductase (HMGCR). Results from some observational studies suggest that statin use may reduce HNSCC risk. We appraised the relationship of genetically-proxied cholesterol-lowering drug targets and other circulating lipid traits with oral (OC) and oropharyngeal (OPC) cancer risk using two-sample Mendelian randomization (MR). For the primary analysis, germline genetic variants in HMGCR, NPC1L1, CETP, PCSK9 and LDLR were used to proxy the effect of low-density lipoprotein cholesterol (LDL-C) lowering therapies. In secondary analyses, variants were used to proxy circulating levels of other lipid traits in a genome-wide association study (GWAS) meta-analysis of 188,578 individuals. Both primary and secondary analyses aimed to estimate the downstream causal effect of cholesterol lowering therapies on OC and OPC risk. The second sample for MR was taken from a GWAS of 6,034 OC and OPC cases and 6,585 controls (GAME-ON). Analyses were replicated in UK Biobank, using 839 OC and OPC cases and 372,016 controls and the results of the GAME-ON and UK Biobank analyses combined in a fixed-effects meta-analysis. We found limited evidence of a causal effect of genetically-proxied LDL-C lowering using HMGCR, NPC1L1, CETP or other circulating lipid traits on either OC or OPC risk. Genetically-proxied PCSK9 inhibition equivalent to a 1 mmol/L (38.7 mg/dL) reduction in LDL-C was associated with an increased risk of OC and OPC combined (OR 1.8 95%CI 1.2, 2.8, p = 9.31 x10-05), with good concordance between GAME-ON and UK Biobank (I2 = 22%). Effects for PCSK9 appeared stronger in relation to OPC (OR 2.6 95%CI 1.4, 4.9) than OC (OR 1.4 95%CI 0.8, 2.4). LDLR variants, resulting in genetically-proxied reduction in LDL-C equivalent to a 1 mmol/L (38.7 mg/dL), reduced the risk of OC and OPC combined (OR 0.7, 95%CI 0.5, 1.0, p = 0.006). A series of pleiotropy-robust and outlier detection methods showed that pleiotropy did not bias our findings. We found limited evidence for a role of cholesterol-lowering in OC and OPC risk, suggesting previous observational results may have been confounded. There was some evidence that genetically-proxied inhibition of PCSK9 increased risk, while lipid-lowering variants in LDLR, reduced risk of combined OC and OPC. This result suggests that the mechanisms of action of PCSK9 on OC and OPC risk may be independent of its cholesterol lowering effects; however, this was not supported uniformly across all sensitivity analyses and further replication of this finding is required.

Download Full-text

Mendelian Randomization Analysis Using Multiple Biomarkers of an Underlying Common Exposure

10.1101/2021.02.05.429979 ◽

2021 ◽

Author(s):

Jin Jin ◽

Guanghao Qi ◽

Zhi Yu ◽

Nilanjan Chatterjee

Keyword(s):

Structural Equation ◽

Type I Error ◽

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

Error Rates ◽

Type I ◽

Genome Wide Association Studies ◽

Increased Risk ◽

Multiple Biomarkers

AbstractMendelian Randomization (MR) analysis is increasingly popular for testing the causal effect of exposures on disease outcomes using data from genome-wide association studies. In some settings, the underlying exposure, such as systematic inflammation, may not be directly observable, but measurements can be available on multiple biomarkers, or other types of traits, that are co-regulated by the exposure. We propose method MRLE, which tests the significance for, and the direction of, the effect of a latent exposure by leveraging information from multiple related traits. The method is developed by constructing a set of estimating functions based on the second-order moments of summary association statistics, under a structural equation model where genetic variants are assumed to have indirect effects through the latent exposure and potentially direct effects on the traits. Simulation studies showed that MRLE has well-controlled type I error rates and increased power compared to single-trait MR tests under various types of pleiotropy. Applications of MRLE using genetic association statistics across five inflammatory biomarkers (CRP, IL-6, IL-8, TNF-α and MCP-1) provided evidence for potential causal effects of inflammation on increased risk of coronary artery disease, colorectal cancer and rheumatoid arthritis, while standard MR analysis for individual biomarkers often failed to detect consistent evidence for such effects.

Download Full-text

MR-Clust: Clustering of genetic variants in Mendelian randomization with similar causal estimates

10.1101/2019.12.18.881326 ◽

2019 ◽

Author(s):

Christopher N Foley ◽

Paul D W Kirk ◽

Stephen Burgess

Keyword(s):

Blood Pressure ◽

Coronary Artery Disease ◽

Risk Factor ◽

Coronary Artery ◽

Genetic Variants ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Coronary Artery Disease Risk ◽

Artery Disease

AbstractMotivationMendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for uncertainty in the causal estimates, and it includes ‘null’ and ‘junk’ clusters, to provide protection against the detection of spurious clusters.ResultsOur algorithm correctly detected the number of clusters in a simulation analysis, outperforming the popular Mclust method. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A hypothesis-free search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster.Availability and ImplementationMR-Clust can be downloaded from https://github.com/cnfoley/[email protected] or [email protected] InformationSupplementary Material is included in the submission.

Download Full-text

An examination of multivariable Mendelian randomization in the single sample and two-sample summary data settings

10.1101/306209 ◽

2018 ◽

Cited By ~ 21

Author(s):

Eleanor Sanderson ◽

George Davey Smith ◽

Frank Windmeijer ◽

Jack Bowden

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Causal Effects ◽

Single Sample ◽

Mendelian Randomisation ◽

Uk Biobank ◽

Level Data ◽

Wide Range ◽

Using Data ◽

Summary Data

AbstractBackgroundMendelian Randomisation (MR) is a powerful tool in epidemiology which can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to Multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome.Methods/ResultsWe use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK biobank to estimate the effect of education and cognitive ability on body mass index.ConclusionMVMR analysis consistently estimates the effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual or summary level data.

Download Full-text

MR-Clust: clustering of genetic variants in Mendelian randomization with similar causal estimates

Bioinformatics ◽

10.1093/bioinformatics/btaa778 ◽

2020 ◽

Author(s):

Christopher N Foley ◽

Amy M Mason ◽

Paul D W Kirk ◽

Stephen Burgess

Keyword(s):

Blood Pressure ◽

Coronary Artery Disease ◽

Risk Factor ◽

Coronary Artery ◽

Genetic Variants ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Coronary Artery Disease Risk ◽

Artery Disease

Abstract Motivation Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes ‘null’ and ‘junk’ clusters, to provide protection against the detection of spurious clusters. Results Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster. Availability and implementation MR-Clust can be downloaded from https://github.com/cnfoley/mrclust. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text