Impact of missing data on type 1 error rates in non-inferiority trials

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.

Download Full-text

Magnitude Based Inference in Relation to One-sided Hypotheses Testing Procedures

10.31236/osf.io/pn9s3 ◽

2020 ◽

Author(s):

Janet Aisbett ◽

Daniel Lakens ◽

Kristin Sainani

Keyword(s):

Error Control ◽

Hypothesis Test ◽

A Priori ◽

Error Rates ◽

Equivalence Testing ◽

High Type ◽

Testing Procedures ◽

Type 1 Error ◽

Sample Size Calculations

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.

Download Full-text

On the Statistical Testing Methods for Single Laboratory Validation of Qualitative Microbiological Assays with a Paired Design

Journal of AOAC International ◽

10.1093/jaoacint/qsaa076 ◽

2020 ◽

Vol 103 (6) ◽

pp. 1667-1679

Author(s):

Shizhen S Wang

Keyword(s):

Average Power ◽

Mixed Effects ◽

Error Rates ◽

T Test ◽

Type 1 Error ◽

Minimum Detectable Difference ◽

Microbiological Assays ◽

Paired Design ◽

Paired T Test

Abstract Background There are several statistical methods for detecting a difference of detection rates between alternative and reference qualitative microbiological assays in a single laboratory validation study with a paired design. Objective We compared performance of eight methods including McNemar’s test, sign test, Wilcoxon signed-rank test, paired t-test, and the regression methods based on conditional logistic (CLOGIT), mixed effects complementary log-log (MCLOGLOG), mixed effects logistic (MLOGIT) models, and a linear mixed effects model (LMM). Methods We first compared the minimum detectable difference in the proportion of detections between the alternative and reference detection methods among these statistical methods for a varied number of test portions. We then compared power and type 1 error rates of these methods using simulated data. Results The MCLOGLOG and MLOGIT models had the lowest minimum detectable difference, followed by the LMM and paired t-test. The MCLOGLOG and MLOGIT models had the highest average power but were anticonservative when correlation between the pairs of outcome values of the alternative and reference methods was high. The LMM and paired t-test had mostly the highest average power when the correlation was low and the second highest average power when the correlation was high. Type 1 error rates of these last two methods approached the nominal value of significance level when the number of test portions was moderately large (n > 20). Highlights The LMM and paired t-test are better choices than other competing methods, and we provide an example using real data.

Download Full-text

Type 1 Error Rates of the Parsimony Permutation Tail Probability Test

Systematic Biology ◽

10.1080/10635150290069931 ◽

2002 ◽

Vol 51 (3) ◽

pp. 524-527 ◽

Cited By ~ 8

Author(s):

Mark Wilkinson ◽

Pedro R. Peres-Neto ◽

Peter G. Foster ◽

Clive B. Moncrieff

Keyword(s):

Tail Probability ◽

Error Rates ◽

Type 1 Error

Download Full-text

Additional file to "Why Psychologists Should by Default Use Welch's t-test Instead of Student's t-test." (in press for the International Review of Social Psychology)

10.31219/osf.io/dqck7 ◽

2017 ◽

Cited By ~ 1

Author(s):

Marie Delacre ◽

Daniel Lakens ◽

Christophe Leys

Keyword(s):

Psychological Research ◽

Error Rates ◽

T Test ◽

International Review ◽

Type 1 Error ◽

Statistical Inferences ◽

Homogeneity Of Variance ◽

Student’S T ◽

Student’S T Test

When comparing two independent groups, researchers in Psychology commonly use Student’s t-test. Assumptions of normality and of homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased, and leads to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research and that choosing between Student’s t-test or Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.

Download Full-text

On the analysis of composite measures of quality in medical research

Statistical Methods in Medical Research ◽

10.1177/0962280214553330 ◽

2014 ◽

Vol 26 (2) ◽

pp. 633-660

Author(s):

Rahim Moineddin ◽

Christopher Meaney ◽

Eva Grunfeld

Keyword(s):

Regression Model ◽

Logistic Regression Model ◽

Error Rates ◽

Type 1 Error ◽

Common Effect ◽

The Common ◽

Composite Endpoints ◽

Over Dispersion ◽

Binary Composite

Composite endpoints are commonplace in biomedical research. The complex nature of many health conditions and medical interventions demand that composite endpoints be employed. Different approaches exist for the analysis of composite endpoints. A Monte Carlo simulation study was employed to assess the statistical properties of various regression methods for analyzing binary composite endpoints. We also applied these methods to data from the BETTER trial which employed a binary composite endpoint. We demonstrated that type 1 error rates are poor for the Negative Binomial regression model and the logistic generalized linear mixed model (GLMM). Bias was minimal and power was highest in the binomial logistic regression model, the linear regression model, the Poisson (corrected for over-dispersion) regression model and the common effect logistic generalized estimating equation (GEE) model. Convergence was poor in the distinct effect GEE models, the logistic GLMM and some of the zero-one inflated beta regression models. Considering the BETTER trial data, the distinct effect GEE model struggled with convergence and the collapsed composite method estimated an effect, which was greatly attenuated compared to other models. All remaining models suggested an intervention effect of similar magnitude. In our simulation study, the binomial logistic regression model (corrected for possible over/under-dispersion), the linear regression model, the Poisson regression model (corrected for over-dispersion) and the common effect logistic GEE model appeared to be unbiased, with good type 1 error rates, power and convergence properties.

Download Full-text

Two Monte Carlo Studies of Silverstein's Nonparametric Multiple Comparison Tests

Psychological Reports ◽

10.2466/pr0.1980.46.2.403 ◽

1980 ◽

Vol 46 (2) ◽

pp. 403-407 ◽

Cited By ~ 3

Author(s):

James D. Church ◽

Edward L. Wike

Keyword(s):

Monte Carlo ◽

Error Rates ◽

Multiple Comparison ◽

Sign Test ◽

Pairwise Comparisons ◽

Type 1 Error ◽

Monte Carlo Studies

Two Monte Carlo studies were done to find the Type 1 error rates for Silverstein's nonparametric pairwise multiple-comparison tests for a one- and two-way layout with ranked data. Silverstein's tests had excellent experimentwise error rates but did not do as well as Wilcoxon's tests and the stepped-down sign test when pairwise comparisons were performed after significant over-all tests. Silverstein's tests were shown to be equivalent to recently proposed tests by Levy.

Download Full-text

An excess of positive results: Comparing the standard Psychology literature with Registered Reports

10.31234/osf.io/p6e9c ◽

2020 ◽

Cited By ~ 8

Author(s):

Anne M. Scheel ◽

Mitchell Schijen ◽

Daniel Lakens

Keyword(s):

Error Rates ◽

Type 1 Error ◽

Negative Results ◽

Study Results ◽

Inflated Type ◽

The Difference ◽

Psychology Literature ◽

Positive Results ◽

Full Population

When studies with positive results that support the tested hypotheses have a higher probability of being published than studies with negative results, the literature will give a distorted view of the evidence for scientific claims. Psychological scientists have been concerned about the degree of distortion in their literature due to publication bias and inflated Type-1 error rates. Registered Reports were developed with the goal to minimise such biases: In this new publication format, peer review and the decision to publish take place before the study results are known. We compared the results in the full population of published Registered Reports in Psychology (N = 71 as of November 2018) with a random sample of hypothesis-testing studies from the standard literature (N = 152) by searching 633 journals for the phrase ‘test* the hypothes*’ (replicating a method by Fanelli, 2010). Analysing the first hypothesis reported in each paper, we found 96% positive results in standard reports, but only 44% positive results in Registered Reports. The difference remained nearly as large when direct replications were excluded from the analysis (96% vs 50% positive results). This large gap suggests that psychologists underreport negative results to an extent that threatens cumulative science. Although our study did not directly test the effectiveness of Registered Reports at reducing bias, these results show that the introduction of Registered Reports has led to a much larger proportion of negative results appearing in the published literature compared to standard reports.

Download Full-text

Will knowledge about more efficient study designs increase the willingness to pre-register?

10.31222/osf.io/svzyc ◽

2017 ◽

Author(s):

Daniel Lakens

Keyword(s):

Error Rates ◽

Type 1 Error ◽

Research Designs ◽

The Future ◽

Register Studies ◽

Study Designs ◽

And Control ◽

Sequential Analyses ◽

Benefits And Costs

Pre-registration is a straightforward way to make science more transparant, and control Type 1 error rates. Pre-registration is often presented as beneficial for science in general, but rarely as a practice that leads to immediate individual benefits for researchers. One benefit of pre-registered studies is that they allow for non-conventional research designs that are more efficient than conventional designs. For example, by performing one-tailed tests and sequential analyses researchers can perform well-powered studies much more efficiently. Here, I examine whether such non-conventional but more efficient designs are considered appropriate by editors under the pre-condition that the analysis plans are pre-registered, and if so, whether researchers are more willing to pre-register their analysis plan to take advantage of the efficiency benefits of non-conventional designs. Study 1 shows the large majority of editors judged one-tailed tests and sequential analyses to be appropriate in psychology, but only when such analyses are pre-registered. In Study 2 I asked experimental psychologists to indicate their attitude towards pre-registration. Half of these researchers first read about the acceptence of one-tailed tests and sequential analyses by editors, and the efficiency gains of using these procedures. However, learning about the efficiency benefits associated with one-tailed tests and sequential analyses did not substantially influence researchers' attitudes about benefits and costs of pre-registration, or their willingness to pre-register studies. The self-reported likelihood of pre-registering studies in the next two years, as well as the percentage of studies researchers planned to pre-register in the future, was surprisingly high. 47% of respondents already had experience pre-registering, and 94% of respondents indicating that they would consider pre-registering at least some of their research in the future. Given this already strong self-reported willingness to pre-register studies, pointing out immediate individual benefits seems unlikely to be a useful way to increase researchers' willingness to pre-register any further.

Download Full-text