1455Leveraging multi-omic negative controls for effect estimation in molecular epidemiologic studies: A simulation study
Abstract Background Exploratory null-hypothesis significance testing (e.g. GWAS, EWAS) form the backbone of molecular epidemiology, however methods to identify true causal signals are underdeveloped. Via plasmode simulation, I evaluate two approaches to quantitatively control for shared unmeasured confounding and recover unbiased effects using complementary epigenomes and biologically-informed structural assumptions. Methods I adapt proposed negative control-based estimators, the control outcome calibration approach (COCA) and proximal g-computation (PG) to case studies in perinatal molecular epidemiology. COCA may be employed when maternal epigenome has no direct effects on phenotype and proxy shared unmeasured confounders and PG further with suitable genetic instruments (e.g. mQTLs). Baseline covariates were extracted from 777 mother-child pairs in a birth cohort with maternal blood and fetal cord DNA methylation array data. Treatment and outcome values were simulated in 2000 bootstraps. Bootstrapped, ordinary (COCA) and 2-stage (PG) least squares were fitted to estimate treatment effects and standard errors under various common settings of missing confounders (e.g. paternal data). Doubly-robust, machine learning estimators were explored. Results COCA and PG performed well in simplistic data generating processes. However, in real-world cohort simulations, COCA performed acceptably only in settings with strong proxy confounders, but otherwise poorly (median bias 610%; coverage 29%). PG performed slightly better. Alternatively, simple covariate adjustment for maternal methylation outperformed (median bias 22%; 71% coverage) COCA, PG, and machine learning estimators. Discussion Molecular epidemiology provides key opportunity to leverage biological knowledge against unmeasured confounding. Negative control calibration or adjustments may help under limited scenarios where assumptions are fulfilled, but should be tested with suitable simulations. Key messages Quantitative approaches for unmeasured confounding in molecular epidemiology are a critical gap. Negative control calibration or adjustment may help under limiting scenarios. Proposed estimators should be tested in simulation settings that closely mimic target data.