Combining the strengths of inverse-variance weighting and Egger regression in Mendelian randomization using a mixture of regressions model

With the increasing availability of large-scale GWAS summary data on various traits, Mendelian randomization (MR) has become commonly used to infer causality between a pair of traits, an exposure and an outcome. It depends on using genetic variants, typically SNPs, as instrumental variables (IVs). The inverse-variance weighted (IVW) method (with a fixed-effect meta-analysis model) is most powerful when all IVs are valid; however, when horizontal pleiotropy is present, it may lead to biased inference. On the other hand, Egger regression is one of the most widely used methods robust to (uncorrelated) pleiotropy, but it suffers from loss of power. We propose a two-component mixture of regressions to combine and thus take advantage of both IVW and Egger regression; it is often both more efficient (i.e. higher powered) and more robust to pleiotropy (i.e. controlling type I error) than either IVW or Egger regression alone by accounting for both valid and invalid IVs respectively. We propose a model averaging approach and a novel data perturbation scheme to account for uncertainties in model/IV selection, leading to more robust statistical inference for finite samples. Through extensive simulations and applications to the GWAS summary data of 48 risk factor-disease pairs and 63 genetically uncorrelated trait pairs, we showcase that our proposed methods could often control type I error better while achieving much higher power than IVW and Egger regression (and sometimes than several other new/popular MR methods). We expect that our proposed methods will be a useful addition to the toolbox of Mendelian randomization for causal inference.

Download Full-text

Improving the accuracy of two-sample summary data Mendelian randomization: moving beyond the NOME assumption

10.1101/159442 ◽

2017 ◽

Cited By ~ 17

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Type I Error ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Type I ◽

Weak Instruments ◽

Using Data ◽

Summary Data

AbstractBackgroundTwo-sample summary data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated.MethodsCausal estimation and heterogeneity assessment in MR requires an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘1st order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘2nd order’ weights can dramatically increase the chances of failing to detect heterogeneity, when it is truly present. We derive modified weights to mitigate both of these adverse effects.ResultsUsing Monte Carlo simulations, we show that the modified weights outperform 1st and 2nd order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using 1st and 2nd order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared to 1st order weighting. Moreover, 1st order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk.ConclusionsWe propose the use of modified weights within two-sample summary data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with 1st order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Cannons and sparrows II: the enhanced Bernoulli exact method for determining statistical significance and effect size in the meta-analysis of k 2 × 2 tables

Emerging Themes in Epidemiology ◽

10.1186/s12982-021-00101-8 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Lawrence M. Paul

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Statistical Significance ◽

Statistical Error ◽

Exact Method ◽

Type I ◽

Exact Test ◽

Inverse Variance ◽

Meta Analyses

Abstract Background The use of meta-analysis to aggregate the results of multiple studies has increased dramatically over the last 40 years. For homogeneous meta-analysis, the Mantel–Haenszel technique has typically been utilized. In such meta-analyses, the effect size across the contributing studies of the meta-analysis differs only by statistical error. If homogeneity cannot be assumed or established, the most popular technique developed to date is the inverse-variance DerSimonian and Laird (DL) technique (DerSimonian and Laird, in Control Clin Trials 7(3):177–88, 1986). However, both of these techniques are based on large sample, asymptotic assumptions. At best, they are approximations especially when the number of cases observed in any cell of the corresponding contingency tables is small. Results This research develops an exact, non-parametric test for evaluating statistical significance and a related method for estimating effect size in the meta-analysis of k 2 × 2 tables for any level of heterogeneity as an alternative to the asymptotic techniques. Monte Carlo simulations show that even for large values of heterogeneity, the Enhanced Bernoulli Technique (EBT) is far superior at maintaining the pre-specified level of Type I Error than the DL technique. A fully tested implementation in the R statistical language is freely available from the author. In addition, a second related exact test for estimating the Effect Size was developed and is also freely available. Conclusions This research has developed two exact tests for the meta-analysis of dichotomous, categorical data. The EBT technique was strongly superior to the DL technique in maintaining a pre-specified level of Type I Error even at extremely high levels of heterogeneity. As shown, the DL technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical. In addition, a related exact method for estimating the Effect Size was developed.

Download Full-text

Weak-Instrument Robust Tests in Two-Sample Summary-Data Mendelian Randomization

10.1101/769562 ◽

2019 ◽

Cited By ~ 2

Author(s):

Sheng Wang ◽

Hyunseung Kang

Keyword(s):

Error Control ◽

Type I Error ◽

Mendelian Randomization ◽

Likelihood Ratio Tests ◽

Type I ◽

Conditional Likelihood ◽

Test Statistics ◽

Weak Instruments ◽

Robust Tests ◽

Summary Data

AbstractMendelian randomization (MR) is a popular method in genetic epidemiology to estimate the effect of an exposure on an outcome using genetic variants as instrumental variables (IV), with two-sample summary-data MR being the most popular due to privacy. Unfortunately, many MR methods for two-sample summary data are not robust to weak instruments, a common phenomena with genetic instruments; many of these methods are biased and no existing MR method has Type I error control under weak instruments. In this work, we propose test statistics that are robust to weak instruments by extending Anderson-Rubin, Kleibergen, and conditional likelihood ratio tests in econometrics to the two-sample summary data setting. We conclude with a simulation and an empirical study and show that the proposed tests control size and have better power than current methods.

Download Full-text

Canons and Sparrows II*: The Enhanced Bernoulli Exact Method for Determining Statistical Significance and Effect Size in the Meta-Analysis of k 2 x 2 Tables

10.21203/rs.3.rs-139437/v1 ◽

2021 ◽

Author(s):

Lawrence Marc Paul

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Statistical Significance ◽

Statistical Error ◽

Type I ◽

Exact Test ◽

Strong Focus ◽

Inverse Variance ◽

Meta Analyses

Abstract BackgroundThe use of meta-analysis to aggregate the results of multiple studies has increased dramatically over the last 40 years. For homogeneous meta-analysis, the Mantel-Haenszel technique has typically been utilized. In such meta-analyses, the effect size across the contributing studies of the meta-analysis differ only by statistical error. If homogeneity cannot be assumed or established, the most popular technique developed to date is the inverse-variance DerSimonian & Laird (DL) technique [1]. However, both of these techniques are based on large sample, asymptotic assumptions. At best, they are approximations especially when the number of cases observed in any cell of the corresponding contingency tables is small.ResultsThis research develops an exact, non-parametric test for evaluating statistical significance and a related method for estimating effect size in the meta-analysis of k 2 x 2 tables for any level of heterogeneity as an alternative to the asymptotic techniques. Monte Carlo simulations show that even for large values of heterogeneity, the Enhanced Bernoulli Technique (EBT) is far superior at maintaining the pre-specified level of Type I Error than the DL technique. A fully tested implementation in the R statistical language is freely available from the author. In addition, a second related exact test for estimating the Effect Size was developed and is also freely available.ConclusionsThis research has developed two exact tests for the meta-analysis of dichotomous, categorical data. The EBT technique was strongly superior to the DL technique in maintaining a pre-specified level of Type I Error even at extremely high levels of heterogeneity. As shown, the DL technique demonstrated many large violations of this level. Given the various biases towards finding statistical significance prevalent in epidemiology today, a strong focus on maintaining a pre-specified level of Type I Error would seem critical.

Download Full-text

Reciprocal causation mixture model for robust mendelian randomization analysis using genome-scale summary data

10.21203/rs.3.rs-719945/v1 ◽

2021 ◽

Author(s):

Zipeng Liu ◽

Yiming Qin ◽

Tian Wu ◽

Justin Tubbs ◽

Larry Baum ◽

...

Keyword(s):

Mixture Model ◽

Type I Error ◽

Mendelian Randomization ◽

Error Rates ◽

Type I ◽

Summary Statistics ◽

Reciprocal Causation ◽

Novel Strategy ◽

Genome Scale ◽

Summary Data

Abstract Mendelian randomization (MR) using GWAS summary statistics has become a popular method to infer causal relationships across complex diseases. However, the widespread pleiotropy observed in GWAS has made the selection of valid instrumental variables (IVs) problematic, leading to possible violations of MR assumptions and thus potentially invalid inferences concerning causation. Furthermore, current MR methods can examine causation in only one direction, so that two separate analyses are required for bi-directional analysis. In this study, we propose a novel strategy, MRCI (Mixture model Reciprocal Causation Inference), to estimate reciprocal causation between two phenotypes simultaneously using the genome-scale summary statistics of the two phenotypes and reference linkage disequilibrium (LD) information. Simulation studies, including strong correlated pleiotropy, showed that MRCI obtained nearly unbiased estimates of causation in both directions, and correct Type I error rates under the null hypothesis. In applications to real GWAS data, MRCI detected significant bi-directional and uni-directional causal influences between common diseases and putative risk factors.

Download Full-text

Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption

International Journal of Epidemiology ◽

10.1093/ije/dyy258 ◽

2018 ◽

Vol 48 (3) ◽

pp. 728-742 ◽

Cited By ~ 42

Author(s):

Jack Bowden ◽

Fabiola Del Greco M ◽

Cosetta Minelli ◽

Qingyuan Zhao ◽

Debbie A Lawlor ◽

...

Keyword(s):

Genetic Variants ◽

Mendelian Randomization ◽

Disease Risk ◽

Causal Effect ◽

Meta Analysis ◽

Second Order ◽

Type I ◽

Weak Instruments ◽

First Order ◽

Summary Data

Abstract Background Two-sample summary-data Mendelian randomization (MR) incorporating multiple genetic variants within a meta-analysis framework is a popular technique for assessing causality in epidemiology. If all genetic variants satisfy the instrumental variable (IV) and necessary modelling assumptions, then their individual ratio estimates of causal effect should be homogeneous. Observed heterogeneity signals that one or more of these assumptions could have been violated. Methods Causal estimation and heterogeneity assessment in MR require an approximation for the variance, or equivalently the inverse-variance weight, of each ratio estimate. We show that the most popular ‘first-order’ weights can lead to an inflation in the chances of detecting heterogeneity when in fact it is not present. Conversely, ostensibly more accurate ‘second-order’ weights can dramatically increase the chances of failing to detect heterogeneity when it is truly present. We derive modified weights to mitigate both of these adverse effects. Results Using Monte Carlo simulations, we show that the modified weights outperform first- and second-order weights in terms of heterogeneity quantification. Modified weights are also shown to remove the phenomenon of regression dilution bias in MR estimates obtained from weak instruments, unlike those obtained using first- and second-order weights. However, with small numbers of weak instruments, this comes at the cost of a reduction in estimate precision and power to detect a causal effect compared with first-order weighting. Moreover, first-order weights always furnish unbiased estimates and preserve the type I error rate under the causal null. We illustrate the utility of the new method using data from a recent two-sample summary-data MR analysis to assess the causal role of systolic blood pressure on coronary heart disease risk. Conclusions We propose the use of modified weights within two-sample summary-data MR studies for accurately quantifying heterogeneity and detecting outliers in the presence of weak instruments. Modified weights also have an important role to play in terms of causal estimation (in tandem with first-order weights) but further research is required to understand their strengths and weaknesses in specific settings.

Download Full-text

Transcriptome-wide association studies accounting for colocalization using Egger regression

10.1101/223263 ◽

2017 ◽

Cited By ~ 5

Author(s):

Richard Barfield ◽

Helian Feng ◽

Alexander Gusev ◽

Lang Wu ◽

Wei Zheng ◽

...

Keyword(s):

Large Scale ◽

Type I Error ◽

Mendelian Randomization ◽

Association Studies ◽

Susceptibility Gene ◽

Genetic Effect ◽

Gene Identification ◽

Type I ◽

Disease Loci ◽

Trait Locus

AbstractIntegrating genome-wide association (GWAS) and expression quantitative trait locus (eQTL) data into transcriptome-wide association studies (TWAS) based on predicted expression can boost power to detect novel disease loci or pinpoint the susceptibility gene at a known disease locus. However, it is often the case that multiple eQTL genes colocalize at disease loci, making the identification of the true susceptibility gene challenging, due to confounding through linkage disequilibrium (LD). To distinguish between true susceptibility genes (where the genetic effect on phenotype is mediated through expression) and colocalization due to LD, we examine an extension of the Mendelian Randomization Egger regression method that allows for LD while only requiring summary association data for both GWAS and eQTL. We derive the standard TWAS approach in the context of Mendelian Randomization and show in simulations that the standard TWAS does not control Type I error for causal gene identification when eQTLs have pleiotropic or LD-confounded effects on disease. In contrast, LD Aware MR-Egger regression can control Type I error in this case while attaining similar power as other methods in situations where these provide valid tests. However, when the direct effects of genetic variants on traits are correlated with the eQTL associations, all of the methods we examined including LD Aware MR-Egger regression can have inflated Type I error. We illustrate these methods by integrating gene expression within a recent large-scale breast cancer GWAS to provide guidance on susceptibility gene identification.

Download Full-text

Supplemental Material for Meta-Analysis to Integrate Effect Sizes Within an Article: Possible Misuse and Type I Error Inflation

Journal of Experimental Psychology General ◽

10.1037/xge0000159.supp ◽

2016 ◽

Keyword(s):

Type I Error ◽

Meta Analysis ◽

Effect Sizes ◽

Type I

Download Full-text

The Robustness of the Likelihood Ratio Chi-Square Test for Structural Equation Models: A Meta-Analysis

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986026001105 ◽

2001 ◽

Vol 26 (1) ◽

pp. 105-132 ◽

Cited By ~ 30

Author(s):

Douglas A. Powell ◽

William D. Schafer

Keyword(s):

Structural Equation ◽

Structural Equation Models ◽

Type I Error ◽

Meta Analysis ◽

Generalized Least Squares ◽

Error Rates ◽

Type I ◽

Chi Square ◽

Distribution Free ◽

Projection Techniques

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.

Download Full-text

Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

10.31222/osf.io/x6uhk ◽

2021 ◽

Author(s):

Megha Joshi ◽

James E Pustejovsky ◽

S. Natasha Beretvas

Keyword(s):

Effect Size ◽

Type I Error ◽

Meta Analysis ◽

Error Rates ◽

Small Sample ◽

Type I ◽

Hypothesis Tests ◽

Type I Error Rates ◽

Meta Analyses ◽

Small Sample Correction

The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.

Download Full-text