scholarly journals A Structured Approach to Evaluating Life Course Hypotheses: Moving Beyond Analyses of Exposed Versus Unexposed in the Omics Context

Author(s):  
Yiwen Zhu ◽  
Andrew J Simpkin ◽  
Matthew J Suderman ◽  
Alexandre A Lussier ◽  
Esther Walton ◽  
...  

Abstract The structured life course modeling approach (SLCMA) is a theory-driven analytic method that empirically compares multiple prespecified life course hypotheses characterizing time-dependent exposure-outcome relationships to determine which theory best fits the observed data. In this study, we performed simulations and empirical analyses to evaluate the performance of the SLCMA when applied to genome-wide DNA methylation (DNAm). Using simulations, we compared five statistical inference tests used with SLCMA (n=700), assessing the family-wise error rate, statistical power, and confidence interval coverage to determine whether inference based on these tests was valid in the presence of substantial multiple testing and small effects, two hallmark challenges of inference from omics data. In the empirical analyses, we evaluated the time-dependent relationship of childhood abuse with genome-wide DNAm (n=703). In simulations, selective inference and max-|t|-test performed best: both controlled family-wise error rate and yielded moderate statistical power. Empirical analyses using SLCMA revealed time-dependent effects of childhood abuse on DNAm. Our findings show that SLCMA, applied and interpreted appropriately, can be used in high-throughput settings to examine time-dependent effects underlying exposure-outcome relationships over the life course. We provide recommendations for applying the SLCMA in omics settings and encourage researchers to move beyond analyses of exposed versus unexposed.

2019 ◽  
Author(s):  
Yiwen Zhu ◽  
Andrew J. Simpkin ◽  
Matthew J. Suderman ◽  
Alexandre A. Lussier ◽  
Esther Walton ◽  
...  

AbstractBackgroundLife course epidemiology provides a framework for studying the effects of time-varying exposures on health outcomes. The structured life course modeling approach (SLCMA) is a theory-driven analytic method that empirically compares multiple prespecified life course hypotheses characterizing time-dependent exposure-outcome relationships to determine which theory best fits the observed data. However, the statistical properties of inference methods used with the SLCMA have not been investigated with high-dimensional omics outcomes.MethodsWe performed simulations and empirical analyses to evaluate the performance of the SLCMA when applied to genome-wide DNA methylation (DNAm). In the simulations, we compared five statistical inference tests used by SLCMA (n=700). For each, we assessed the familywise error rate (FWER), statistical power, and confidence interval coverage to determine whether inference based on these tests was valid in the presence of substantial multiple testing and small effect sizes, two hallmark challenges of inference from omics data. In the empirical analyses, we applied the SLCMA to evaluate the time-dependent relationship of childhood abuse with genome-wide DNAm (n=703).ResultsIn the simulations, selective inference and max-|t|-test performed best: both controlled FWER and yielded moderate statistical power. Empirical analyses using SLCMA revealed time-dependent effects of childhood abuse on DNA methylation.ConclusionsOur findings show that SLCMA, applied and interpreted appropriately, can be used in the omics setting to examine time-dependent effects underlying exposure-outcome elationships over the life course. We provide recommendations for applying the SLCMA in high-throughput settings, which we hope will encourage researchers to move beyond analyses of exposed versus unexposed.Key messagesThe structured life course modeling approach (SLCMA) is an effective approach to directly compare life course theories and can be scaled-up in the omics context to examine nuanced relationships between environmental exposures over the life course and biological processes.Of the five statistical inference tests assessed in simulations, we recommend the selective inference method and max-|t|-test for post-selection inference in omics applications of the SLCMA.In an empirical example, we revealed time-dependent effects of childhood abuse on DNA methylation using the SLCMA, with improvement in statistical power when accounting for covariates by applying the Frisch-Waugh-Lovell (FWL) theorem.Researchers should assess p-values in parallel with effects sizes and confidence intervals, as triangulating multiple forms of statistical evidence can strengthen inferences and point to new directions for replication.


Biometrika ◽  
2020 ◽  
Author(s):  
Huijuan Zhou ◽  
Xianyang Zhang ◽  
Jun Chen

Abstract The family-wise error rate (FWER) has been widely used in genome-wide association studies. With the increasing availability of functional genomics data, it is possible to increase the detection power by leveraging these genomic functional annotations. Previous efforts to accommodate covariates in multiple testing focus on the false discovery rate control while covariate-adaptive FWER-controlling procedures remain under-developed. Here we propose a novel covariate-adaptive FWER-controlling procedure that incorporates external covariates which are potentially informative of either the statistical power or the prior null probability. An efficient algorithm is developed to implement the proposed method. We prove its asymptotic validity and obtain the rate of convergence through a perturbation-type argument. Our numerical studies show that the new procedure is more powerful than competing methods and maintains robustness across different settings. We apply the proposed approach to the UK Biobank data and analyze 27 traits with 9 million single-nucleotide polymorphisms tested for associations. Seventy-five genomic annotations are used as covariates. Our approach detects more genome-wide significant loci than other methods in 21 out of the 27 traits.


Biostatistics ◽  
2017 ◽  
Vol 18 (3) ◽  
pp. 477-494 ◽  
Author(s):  
Jakub Pecanka ◽  
Marianne A. Jonker ◽  
Zoltan Bochdanovits ◽  
Aad W. Van Der Vaart ◽  

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.


2019 ◽  
Vol 116 (4) ◽  
pp. 1195-1200 ◽  
Author(s):  
Daniel J. Wilson

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Yuxin Song ◽  
Li Jiang ◽  
Zhiyu Hao ◽  
Runqing Yang

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.


2004 ◽  
Vol 3 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Mark J. van der Laan ◽  
Sandrine Dudoit ◽  
Katherine S. Pollard

This article shows that any single-step or stepwise multiple testing procedure (asymptotically) controlling the family-wise error rate (FWER) can be augmented into procedures that (asymptotically) control tail probabilities for the number of false positives and the proportion of false positives among the rejected hypotheses. Specifically, given any procedure that (asymptotically) controls the FWER at level alpha, we propose simple augmentation procedures that provide (asymptotic) level-alpha control of: (i) the generalized family-wise error rate, i.e., the tail probability, gFWER(k), that the number of Type I errors exceeds a user-supplied integer k, and (ii) the tail probability, TPPFP(q), that the proportion of Type I errors among the rejected hypotheses exceeds a user-supplied value 0


2017 ◽  
Author(s):  
Daniel J. Wilson

Analysis of ‘big data’ frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the family-wise error rate (FWER) is considered the strongest protection against false positives, but makes it difficult to reach the multiple testing-corrected significance threshold. Here I introduce the harmonic mean p-value (HMP) which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP easily combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human-pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all combinations of hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini-Hochberg procedure to detect significant hypotheses, even though the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets because it enhances the potential for scientific discovery.


Author(s):  
Jeong-Seok Choi

Multiple testings are instances that contain simultaneous tests for more than one hypothesis. When multiple testings are conducted at the same time, it is more likely that the null hypothesis is rejected, even if the null hypothesis is correct. If individual hypothesis decisions are based on unadjusted <i>p</i>-values, it is usually more likely that some of the true null hypotheses will be rejected. In order to solve the multiple testing problems, various studies have attempted to increase the power by taking into account the family-wise error rate or false discovery rate and statistics required for testing hypotheses. This article discuss methods that account for the multiplicity issue and introduces various statistical techniques.


Sign in / Sign up

Export Citation Format

Share Document