scholarly journals Hello again, ANOVA: rethinking ANOVA in the context of confirmatory data analysis

2021 ◽  
Author(s):  
Haiyang Jin

Analysis of variance (ANOVA) is one of the most popular statistical methods employed for data analysis in psychology and other fields. Nevertheless, ANOVA is frequently used as an exploratory approach, even in confirmatory studies with explicit hypotheses. Such misapplication may invalidate ANOVA conventions, resulting in reduced statistical power, and even threatening the validity of conclusions. This paper evaluates the appropriateness of ANOVA conventions, discusses the potential motivations possibly misunderstood by researchers, and provides practical suggestions. Moreover, this paper proposes to control the Type I error rate with Hypothesis-based Type I Error Rate to consider both the number of tests and their logical relationships in rejecting the null hypothesis. Furthermore, this paper introduces the simple interaction analysis, which can employ the most straightforward interaction to test a hypothesis of interest. Finally, pre-registration is recommended to provide clarity for the selection of appropriate ANOVA tests in both confirmatory and exploratory studies.

2021 ◽  
Author(s):  
Haocheng Ding ◽  
Lingsong Meng ◽  
Andrew C. Liu ◽  
Michelle L. Gumz ◽  
Andrew J. Bryant ◽  
...  

AbstractCircadian rhythmicity in transcriptomic profiles has been shown in many physiological processes, and the disruption of circadian patterns has been founded to associate with several diseases. In this paper, we developed a series of likelihood-based methods to detect (i) circadian rhythmicity (denoted as LR rhythmicity) and (ii) differential circadian patterns comparing two experimental conditions (denoted as LR diff). In terms of circadian rhythmicity detection, we demonstrated that our proposed LR rhythmicity could better control the type I error rate compared to existing methods under a wide variety of simulation settings. In terms of differential circadian patterns, we developed methods in detecting differential amplitude, differential phase, differential basal level, and differential fit, which also successfully controlled the type I error rate. In addition, we demonstrated that the proposed LR diff could achieve higher statistical power in detecting differential fit, compared to existing methods. The superior performance of LR rhythmicity and LR diff was demonstrated in two real data applications, including a brain aging data (gene expression microarray data of human postmortem brain) and a time-restricted feeding data (RNA sequencing data of human skeletal muscles). An R package for our methods is publicly available on GitHub https://github.com/diffCircadian/diffCircadian.


2016 ◽  
Author(s):  
Etienne P. LeBel ◽  
Lorne Campbell ◽  
Timothy Loving

Several researchers recently outlined unacknowledged costs of open science practices, arguing these costs may outweigh benefits and stifle discovery of novel findings. We scrutinize these researchers' (1) statistical concern that heightened stringency with respect to false-positives will increase false-negatives and (2) meta-scientific concern that larger samples and executing direct replications engender opportunity costs that will decrease the rate of making novel discoveries. We argue their statistical concern is unwarranted given open science proponents recommend such practices to reduce the inflated Type I error rate from .35 down to .05 and simultaneously call for high-powered research to reduce the inflated Type II error rate. Regarding their meta-concern, we demonstrate that incurring some costs is required to increase the rate (and frequency) of making true discoveries because distinguishing true from false hypotheses requires a low Type I error rate, high statistical power, and independent direct replications. We also examine pragmatic concerns raised regarding adopting open science practices for relationship science (pre-registration, open materials, open data, direct replications, sample size); while acknowledging these concerns, we argue they are overstated given available solutions. We conclude benefits of open science practices outweigh costs for both individual researchers and the collective field in the long run, but that short term costs may exist for researchers because of the currently dysfunctional academic incentive structure. Our analysis implies our field's incentive structure needs to change whereby better alignment exists between researcher's career interests and the field's cumulative progress. We delineate recent proposals aimed at such incentive structure re-alignment.


2019 ◽  
Author(s):  
Varun Saravanan ◽  
Gordon J. Berman ◽  
Samuel J. Sober

AbstractA common feature in many neuroscience datasets is the presence of hierarchical data structures, most commonly recording the activity of multiple neurons in multiple animals across multiple trials. Accordingly, the measurements constituting the dataset are not independent, even though the traditional statistical analyses often applied in such cases (e.g. Student’s t-test) treat them as such. The hierarchical bootstrap has been shown to be an effective tool to accurately analyze such data and while it has been used extensively in the statistical literature, its use is not widespread in neuroscience - despite the ubiquity of hierarchical datasets. In this paper, we illustrate the intuitiveness and utility of this approach to analyze hierarchically nested datasets. We use simulated neural data to show that traditional statistical tests can result in a false positive rate of over 45%, even if the Type-I error rate is set at 5%. While summarizing data across non-independent points (or lower levels) can potentially fix this problem, this approach greatly reduces the statistical power of the analysis. The hierarchical bootstrap, when applied sequentially over the levels of the hierarchical structure, keeps the Type-I error rate within the intended bound and retains more statistical power than summarizing methods. We conclude by demonstrating the effectiveness of the method in two real-world examples, first analyzing singing data in male Bengalese finches (Lonchura striata var. domestica) and second quantifying changes in behavior under optogenetic control in flies (Drosophila melanogaster).


2020 ◽  
Author(s):  
Cathy S. J. Fann ◽  
Thai Son Dinh ◽  
Yu-Hsien Chang ◽  
Jia Jyun Sie ◽  
Ie-Bin Lian

Abstract Background: Propensity score (PS) is a popular method for reducing multiple confounding effects in observational studies. It is applicable mainly for situations wherein the exposure/treatment of interest is dichotomous and the PS can be estimated through logistic regression. However, multinomial exposures with 3 or more levels are not rare, e.g., when considering genetic variants, such as single nucleotide polymorphisms (SNPs), which have 3 levels (aa/aA/AA), as an exposure. Conventional PS is inapplicable for this situation unless the 3 levels are collapsed into 2 classes first. Methods: A simulation study was conducted to compare the performance of the proposed multinomial propensity score (MPS) method under various contrast codings and approaches, including regression adjustment and matching.Results: MPS methods had more reasonable type I error rate than the non-MPS methods, of which the latter could be as high as 30~50%. Compared with MPS-direct adjusted methods, MPS-matched cohort methods have better power but larger type I error rate. Performance of contrast codings depend on the selection of MPS models. Conclusions: In general, two combinations had relatively better performance in our simulation of ternary exposure: MPS-matched cohort method with Helmert contrast and MPS-direct adjusted regression with treatment contrasts. Compared with the latter, the former had better power but larger type I error rate as a trade-off.


2014 ◽  
Vol 53 (05) ◽  
pp. 343-343

We have to report marginal changes in the empirical type I error rates for the cut-offs 2/3 and 4/7 of Table 4, Table 5 and Table 6 of the paper “Influence of Selection Bias on the Test Decision – A Simulation Study” by M. Tamm, E. Cramer, L. N. Kennes, N. Heussen (Methods Inf Med 2012; 51: 138 –143). In a small number of cases the kind of representation of numeric values in SAS has resulted in wrong categorization due to a numeric representation error of differences. We corrected the simulation by using the round function of SAS in the calculation process with the same seeds as before. For Table 4 the value for the cut-off 2/3 changes from 0.180323 to 0.153494. For Table 5 the value for the cut-off 4/7 changes from 0.144729 to 0.139626 and the value for the cut-off 2/3 changes from 0.114885 to 0.101773. For Table 6 the value for the cut-off 4/7 changes from 0.125528 to 0.122144 and the value for the cut-off 2/3 changes from 0.099488 to 0.090828. The sentence on p. 141 “E.g. for block size 4 and q = 2/3 the type I error rate is 18% (Table 4).” has to be replaced by “E.g. for block size 4 and q = 2/3 the type I error rate is 15.3% (Table 4).”. There were only minor changes smaller than 0.03. These changes do not affect the interpretation of the results or our recommendations.


2003 ◽  
Vol 22 (5) ◽  
pp. 665-675 ◽  
Author(s):  
Weichung J. Shih ◽  
Peter Ouyang ◽  
Hui Quan ◽  
Yong Lin ◽  
Bart Michiels ◽  
...  

2021 ◽  
pp. 174077452110101
Author(s):  
Jennifer Proper ◽  
John Connett ◽  
Thomas Murray

Background: Bayesian response-adaptive designs, which data adaptively alter the allocation ratio in favor of the better performing treatment, are often criticized for engendering a non-trivial probability of a subject imbalance in favor of the inferior treatment, inflating type I error rate, and increasing sample size requirements. The implementation of these designs using the Thompson sampling methods has generally assumed a simple beta-binomial probability model in the literature; however, the effect of these choices on the resulting design operating characteristics relative to other reasonable alternatives has not been fully examined. Motivated by the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial, we posit that a logistic probability model coupled with an urn or permuted block randomization method will alleviate some of the practical limitations engendered by the conventional implementation of a two-arm Bayesian response-adaptive design with binary outcomes. In this article, we discuss up to what extent this solution works and when it does not. Methods: A computer simulation study was performed to evaluate the relative merits of a Bayesian response-adaptive design for the Advanced R2 Eperfusion STrategies for Refractory Cardiac Arrest trial using the Thompson sampling methods based on a logistic regression probability model coupled with either an urn or permuted block randomization method that limits deviations from the evolving target allocation ratio. The different implementations of the response-adaptive design were evaluated for type I error rate control across various null response rates and power, among other performance metrics. Results: The logistic regression probability model engenders smaller average sample sizes with similar power, better control over type I error rate, and more favorable treatment arm sample size distributions than the conventional beta-binomial probability model, and designs using the alternative randomization methods have a negligible chance of a sample size imbalance in the wrong direction. Conclusion: Pairing the logistic regression probability model with either of the alternative randomization methods results in a much improved response-adaptive design in regard to important operating characteristics, including type I error rate control and the risk of a sample size imbalance in favor of the inferior treatment.


1977 ◽  
Vol 2 (3) ◽  
pp. 187-206 ◽  
Author(s):  
Charles G. Martin ◽  
Paul A. Games

This paper presents an exposition and an empirical comparison of two potentially useful tests for homogeneity of variance. Control of Type I error rate, P(EI), and power are investigated for three forms of the Box test and for two forms of the jackknife test with equal and unequal n's under conditions of normality and nonnormality. The Box test is shown to be robust to violations of the assumption of normality. The jackknife test is shown not to be robust. When n's are unequal, the problem of heterogeneous within-cell variances of the transformed values and unequal n's affects the jackknife and Box tests. Previously reported suggestions for selecting subsample sizes for the Box test are shown to be inappropriate, producing an inflated P(EI). Two procedures which alleviate this problem are presented for the Box test. Use of the jack-knife test with a reduced alpha is shown to provide power and control of P(EI) at approximately the same level as the Box test. Recommendations for the use of these techniques and computational examples of each are provided.


2018 ◽  
Vol 28 (8) ◽  
pp. 2385-2403 ◽  
Author(s):  
Tobias Mütze ◽  
Ekkehard Glimm ◽  
Heinz Schmidli ◽  
Tim Friede

Robust semiparametric models for recurrent events have received increasing attention in the analysis of clinical trials in a variety of diseases including chronic heart failure. In comparison to parametric recurrent event models, robust semiparametric models are more flexible in that neither the baseline event rate nor the process inducing between-patient heterogeneity needs to be specified in terms of a specific parametric statistical model. However, implementing group sequential designs in the robust semiparametric model is complicated by the fact that the sequence of Wald statistics does not follow asymptotically the canonical joint distribution. In this manuscript, we propose two types of group sequential procedures for a robust semiparametric analysis of recurrent events. The first group sequential procedure is based on the asymptotic covariance of the sequence of Wald statistics and it guarantees asymptotic control of the type I error rate. The second procedure is based on the canonical joint distribution and does not guarantee asymptotic type I error rate control but is easy to implement and corresponds to the well-known standard approach for group sequential designs. Moreover, we describe how to determine the maximum information when planning a clinical trial with a group sequential design and a robust semiparametric analysis of recurrent events. We contrast the operating characteristics of the proposed group sequential procedures in a simulation study motivated by the ongoing phase 3 PARAGON-HF trial (ClinicalTrials.gov identifier: NCT01920711) in more than 4600 patients with chronic heart failure and a preserved ejection fraction. We found that both group sequential procedures have similar operating characteristics and that for some practically relevant scenarios, the group sequential procedure based on the canonical joint distribution has advantages with respect to the control of the type I error rate. The proposed method for calculating the maximum information results in appropriately powered trials for both procedures.


Sign in / Sign up

Export Citation Format

Share Document