An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings

Abstract Background Mendelian randomization (MR) is a powerful tool in epidemiology that can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilizing genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome. Methods and results We use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single-sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK Biobank to estimate the effect of education and cognitive ability on body mass index. Conclusion MVMR analysis consistently estimates the direct causal effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual- or summary-level data.

Download Full-text

An examination of multivariable Mendelian randomization in the single sample and two-sample summary data settings

10.1101/306209 ◽

2018 ◽

Cited By ~ 21

Author(s):

Eleanor Sanderson ◽

George Davey Smith ◽

Frank Windmeijer ◽

Jack Bowden

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Causal Effects ◽

Single Sample ◽

Mendelian Randomisation ◽

Uk Biobank ◽

Level Data ◽

Wide Range ◽

Using Data ◽

Summary Data

AbstractBackgroundMendelian Randomisation (MR) is a powerful tool in epidemiology which can be used to estimate the causal effect of an exposure on an outcome in the presence of unobserved confounding, by utilising genetic variants that are instrumental variables (IVs) for the exposure. This has been extended to Multivariable MR (MVMR) to estimate the effect of two or more exposures on an outcome.Methods/ResultsWe use simulations and theory to clarify the interpretation of estimated effects in a MVMR analysis under a range of underlying scenarios, where a secondary exposure acts variously as a confounder, a mediator, a pleiotropic pathway and a collider. We then describe how instrument strength and validity can be assessed for an MVMR analysis in the single sample setting, and develop tests to assess these assumptions in the popular two-sample summary data setting. We illustrate our methods using data from UK biobank to estimate the effect of education and cognitive ability on body mass index.ConclusionMVMR analysis consistently estimates the effect of an exposure, or exposures, of interest and provides a powerful tool for determining causal effects in a wide range of scenarios with either individual or summary level data.

Download Full-text

Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption

10.1101/159442 ◽

2017 ◽

Cited By ~ 17

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Type I Error ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Type I ◽

Weak Instruments ◽

Using Data ◽

Summary Data

AbstractBackgroundTwo-sample summary data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated.MethodsCausal estimation and heterogeneity assessment in MR requires an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘1st order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘2nd order’ weights can dramatically increase the chances of failing to detect heterogeneity, when it is truly present. We derive modified weights to mitigate both of these adverse effects.ResultsUsing Monte Carlo simulations, we show that the modified weights outperform 1st and 2nd order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using 1st and 2nd order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared to 1st order weighting. Moreover, 1st order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk.ConclusionsWe propose the use of modified weights within two-sample summary data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with 1st order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Factorial Mendelian randomization: using genetic variants to assess interactions

International Journal of Epidemiology ◽

10.1093/ije/dyz161 ◽

2019 ◽

Vol 49 (4) ◽

pp. 1147-1158 ◽

Cited By ~ 6

Author(s):

Jessica M B Rees ◽

Christopher N Foley ◽

Stephen Burgess

Keyword(s):

Risk Factors ◽

Instrumental Variables ◽

Genetic Variants ◽

Mendelian Randomization ◽

Real Data ◽

Uk Biobank ◽

Pharmacological Interventions ◽

Natural Break ◽

Using Data ◽

Randomization Analysis

Abstract Background Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. Although the approach has been used in applied investigations, little methodological advice is available on how to design or perform a factorial Mendelian randomization analysis. Previous analyses have employed a 2 × 2 approach, using dichotomized genetic scores to divide the population into four subgroups as in a factorial randomized trial. Methods We describe two distinct contexts for factorial Mendelian randomization: investigating interactions between risk factors, and investigating interactions between pharmacological interventions on risk factors. We propose two-stage least squares methods using all available genetic variants and their interactions as instrumental variables, and using continuous genetic scores as instrumental variables rather than dichotomized scores. We illustrate our methods using data from UK Biobank to investigate the interaction between body mass index and alcohol consumption on systolic blood pressure. Results Simulated and real data show that efficiency is maximized using the full set of interactions between genetic variants as instruments. In the applied example, between 4- and 10-fold improvement in efficiency is demonstrated over the 2 × 2 approach. Analyses using continuous genetic scores are more efficient than those using dichotomized scores. Efficiency is improved by finding genetic variants that divide the population at a natural break in the distribution of the risk factor, or else divide the population into more equal-sized groups. Conclusions Previous factorial Mendelian randomization analyses may have been underpowered. Efficiency can be improved by using all genetic variants and their interactions as instrumental variables, rather than the 2 × 2 approach.

Download Full-text

Using genetic variants to evaluate the causal effect of cholesterol lowering on head and neck cancer risk: A Mendelian randomization study

PLoS Genetics ◽

10.1371/journal.pgen.1009525 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1009525

Author(s):

Mark Gormley ◽

James Yarmolinsky ◽

Tom Dudding ◽

Kimberley Burrows ◽

Richard M. Martin ◽

...

Keyword(s):

Cancer Risk ◽

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Meta Analysis ◽

Uk Biobank ◽

Limited Evidence ◽

Cholesterol Lowering ◽

Increased Risk ◽

Secondary Analyses

Head and neck squamous cell carcinoma (HNSCC), which includes cancers of the oral cavity and oropharynx, is a cause of substantial global morbidity and mortality. Strategies to reduce disease burden include discovery of novel therapies and repurposing of existing drugs. Statins are commonly prescribed for lowering circulating cholesterol by inhibiting HMG-CoA reductase (HMGCR). Results from some observational studies suggest that statin use may reduce HNSCC risk. We appraised the relationship of genetically-proxied cholesterol-lowering drug targets and other circulating lipid traits with oral (OC) and oropharyngeal (OPC) cancer risk using two-sample Mendelian randomization (MR). For the primary analysis, germline genetic variants in HMGCR, NPC1L1, CETP, PCSK9 and LDLR were used to proxy the effect of low-density lipoprotein cholesterol (LDL-C) lowering therapies. In secondary analyses, variants were used to proxy circulating levels of other lipid traits in a genome-wide association study (GWAS) meta-analysis of 188,578 individuals. Both primary and secondary analyses aimed to estimate the downstream causal effect of cholesterol lowering therapies on OC and OPC risk. The second sample for MR was taken from a GWAS of 6,034 OC and OPC cases and 6,585 controls (GAME-ON). Analyses were replicated in UK Biobank, using 839 OC and OPC cases and 372,016 controls and the results of the GAME-ON and UK Biobank analyses combined in a fixed-effects meta-analysis. We found limited evidence of a causal effect of genetically-proxied LDL-C lowering using HMGCR, NPC1L1, CETP or other circulating lipid traits on either OC or OPC risk. Genetically-proxied PCSK9 inhibition equivalent to a 1 mmol/L (38.7 mg/dL) reduction in LDL-C was associated with an increased risk of OC and OPC combined (OR 1.8 95%CI 1.2, 2.8, p = 9.31 x10-05), with good concordance between GAME-ON and UK Biobank (I2 = 22%). Effects for PCSK9 appeared stronger in relation to OPC (OR 2.6 95%CI 1.4, 4.9) than OC (OR 1.4 95%CI 0.8, 2.4). LDLR variants, resulting in genetically-proxied reduction in LDL-C equivalent to a 1 mmol/L (38.7 mg/dL), reduced the risk of OC and OPC combined (OR 0.7, 95%CI 0.5, 1.0, p = 0.006). A series of pleiotropy-robust and outlier detection methods showed that pleiotropy did not bias our findings. We found limited evidence for a role of cholesterol-lowering in OC and OPC risk, suggesting previous observational results may have been confounded. There was some evidence that genetically-proxied inhibition of PCSK9 increased risk, while lipid-lowering variants in LDLR, reduced risk of combined OC and OPC. This result suggests that the mechanisms of action of PCSK9 on OC and OPC risk may be independent of its cholesterol lowering effects; however, this was not supported uniformly across all sensitivity analyses and further replication of this finding is required.

Download Full-text

An efficient and robust approach to Mendelian randomization with measured pleiotropic effects in a high-dimensional setting

Biostatistics ◽

10.1093/biostatistics/kxaa045 ◽

2020 ◽

Author(s):

Andrew J Grant ◽

Stephen Burgess

Keyword(s):

Risk Factor ◽

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Robust Estimator ◽

Simulation Studies ◽

Individual Level ◽

Large Numbers ◽

Level Data ◽

Randomization Analysis

Summary Valid estimation of a causal effect using instrumental variables requires that all of the instruments are independent of the outcome conditional on the risk factor of interest and any confounders. In Mendelian randomization studies with large numbers of genetic variants used as instruments, it is unlikely that this condition will be met. Any given genetic variant could be associated with a large number of traits, all of which represent potential pathways to the outcome which bypass the risk factor of interest. Such pleiotropy can be accounted for using standard multivariable Mendelian randomization with all possible pleiotropic traits included as covariates. However, the estimator obtained in this way will be inefficient if some of the covariates do not truly sit on pleiotropic pathways to the outcome. We present a method that uses regularization to identify which out of a set of potential covariates need to be accounted for in a Mendelian randomization analysis in order to produce an efficient and robust estimator of a causal effect. The method can be used in the case where individual-level data are not available and the analysis must rely on summary-level data only. It can be used where there are any number of potential pleiotropic covariates up to the number of genetic variants less one. We show the results of simulation studies that demonstrate the performance of the proposed regularization method in realistic settings. We also illustrate the method in an applied example which looks at the causal effect of urate plasma concentration on coronary heart disease.

Download Full-text

Estimating the causal effect of genetic liability to prevalent disease on hospital costs using Mendelian Randomization

10.1101/2020.07.09.20149906 ◽

2020 ◽

Author(s):

Padraig Dixon ◽

Sean Harrison ◽

William Hollingworth ◽

Neil M Davies ◽

George Davey Smith

Keyword(s):

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Hospital Costs ◽

Disease Status ◽

Uk Biobank ◽

Inpatient Hospital ◽

Genome Wide ◽

Split Sample ◽

Randomization Analysis

BACKGROUND Accurate measurement of the effects of disease status on healthcare cost is important in the pragmatic evaluation of interventions but is complicated by endogeneity biases due to omitted variables and reverse causality. Mendelian Randomization, the use of random perturbations in germline genetic variation as instrumental variables, can avoid these limitations. We report a novel Mendelian Randomization analysis of the causal effect of liability to disease on healthcare costs. METHODS We used Mendelian Randomization to model the causal impact on inpatient hospital costs of liability to six highly prevalent diseases: asthma, eczema, migraine, coronary heart disease, type 2 diabetes, and major depressive disorder. We identified genetic variants from replicated genome-wide associations studies and estimated their association with inpatient hospital costs using data from UK Biobank, a large prospective cohort study of individuals linked to records of hospital care. We assessed potential violations of the instrumental variable assumptions, particularly the exclusion restriction (i.e. variants affecting costs through alternative paths). We also conducted new genome wide association studies of hospital costs within the UK Biobank cohort as a further split sample sensitivity analysis. RESULTS We analyzed data on 307,032 individuals. Genetic variants explained only a small portion of the variance in each disease phenotype. Liability to coronary heart disease had substantial impacts (mean per person per year increase in costs from allele score Mendelian Randomization models: 712 pounds sterling (95% confidence interval: 238 pounds to 1,186 pounds)) on inpatient hospital costs in causal analysis, but other results were imprecise. There was concordance of findings across varieties of sensitivity analyses, including stratification by sex, and those obtained from the split sample analysis. CONCLUSION A novel Mendelian Randomization analysis of the causal effect of liability to disease on healthcare cost demonstrates that this type of analysis is feasible and informative in this context. There was concordance across data sources and across methods bearing different assumptions. Selection into the relatively healthy UK Biobank cohort and the modest proportion of variance in disease status accounted for by the allele scores reduced the precision of our estimates. We therefore could not exclude the possibility of substantial costs due to these diseases.

Download Full-text

Exploiting collider bias to apply two-sample summary data Mendelian randomization methods to one-sample individual level data

10.1101/2020.10.20.20216358 ◽

2020 ◽

Author(s):

Ciarrah Barry ◽

Junxi Liu ◽

Rebecca Richmond ◽

Martin K Rutter ◽

Deborah A Lawlor ◽

...

Keyword(s):

Mendelian Randomization ◽

Association Studies ◽

General Procedure ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Individual Level ◽

Level Data ◽

Summary Data ◽

Collider Bias

AbstractOver the last decade the availability of SNP-trait associations from genome-wide association studies data has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification.In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. A weighted sum of these estimates is then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes.Our approach is closely related to the work of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our paper serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.

Download Full-text

Exploiting collider bias to apply two-sample summary data Mendelian randomization methods to one-sample individual level data

PLoS Genetics ◽

10.1371/journal.pgen.1009703 ◽

2021 ◽

Vol 17 (8) ◽

pp. e1009703

Author(s):

Ciarrah Barry ◽

Junxi Liu ◽

Rebecca Richmond ◽

Martin K. Rutter ◽

Deborah A. Lawlor ◽

...

Keyword(s):

Mendelian Randomization ◽

Association Studies ◽

General Procedure ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Individual Level ◽

Level Data ◽

Summary Data ◽

Collider Bias

Over the last decade the availability of SNP-trait associations from genome-wide association studies has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification. In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. These estimates are then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes. Our approach can be viewed as a generalization of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our work serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.

Download Full-text

Searching for the causal effects of BMI in over 300 000 individuals, using Mendelian randomization

10.1101/236182 ◽

2017 ◽

Cited By ~ 2

Author(s):

Louise A C Millard ◽

Neil M Davies ◽

Kate Tilling ◽

Tom R Gaunt ◽

George Davey Smith

Keyword(s):

Body Mass Index ◽

Mendelian Randomization ◽

Causal Effect ◽

Causal Effects ◽

P Value ◽

Protective Effects ◽

Independent Subset ◽

Uk Biobank ◽

False Discovery ◽

Causal Risk Factor

ABSTRACTMendelian randomization (MR) has been used to estimate the causal effect of body mass index (BMI) on particular traits thought to be affected by BMI. However, BMI may also be a modifiable, causal risk factor for outcomes where there is no prior reason to suggest that a causal effect exists. We perform a MR phenome-wide association study (MR-pheWAS) to search for the causal effects of BMI in UK Biobank (n=334 968), using the PHESANT open-source phenome scan tool. Of the 20 461 tests performed, our MR-pheWAS identified 519 associations below a stringent P value threshold corresponding to a 5% estimated false discovery rate, including many previously identified causal effects. We also identified several novel effects, including protective effects of higher BMI on a set of psychosocial traits, identified initially in our preliminary MR-pheWAS and replicated in an independent subset of UK Biobank. Such associations need replicating in an independent sample.

Download Full-text

Inference about causation from examination of familial confounding (ICE FALCON): a model for assessing causation analogous to Mendelian randomization

International Journal of Epidemiology ◽

10.1093/ije/dyaa065 ◽

2020 ◽

Vol 49 (4) ◽

pp. 1259-1269

Author(s):

Shuai Li ◽

Minh Bui ◽

John L Hopper

Keyword(s):

Genetic Variants ◽

Instrumental Variable ◽

Mendelian Randomization ◽

Causal Effect ◽

A Priori ◽

Genetic Data ◽

Regression Coefficients ◽

Genetic Knowledge ◽

Wide Range ◽

Related Individuals

Abstract Background We developed a method to make Inference about Causation from Examination of FAmiliaL CONfounding (ICE FALCON) using observational data for related individuals and considering changes in a pair of regression coefficients. ICE FALCON has some similarities to Mendelian randomization (MR) but uses in effect all the familial determinants of the exposure, not just those captured by measured genetic variants, and does not require genetic data nor make strong assumptions. ICE FALCON can assess tracking of a measure over time, an issue often difficult to assess using MR due to lack of a valid instrumental variable. Methods We describe ICE FALCON and present two empirical applications with simulations. Results We found evidence consistent with body mass index (BMI) having a causal effect on DNA methylation at the ABCG1 locus, the same conclusion as from MR analyses but providing about 2.5 times more information per subject. We found evidence that tracking of BMI is consistent with longitudinal causation, as well as familial confounding. The simulations supported the validity of ICE FALCON. Conclusions There are conceptual similarities between ICE FALCON and MR, but empirically they are giving similar conclusions with possibly more information per subject from ICE FALCON. ICE FALCON can be applied to circumstances in which MR cannot be applied, such as when there is no a priori genetic knowledge and/or data available to create a valid instrumental variable, or when the assumptions underlying MR analysis are suspect. ICE FALCON could provide insights into causality for a wide range of public health questions.

Download Full-text