scholarly journals On the Statistical Testing Methods for Single Laboratory Validation of Qualitative Microbiological Assays with a Paired Design

2020 ◽  
Vol 103 (6) ◽  
pp. 1667-1679
Author(s):  
Shizhen S Wang

Abstract Background There are several statistical methods for detecting a difference of detection rates between alternative and reference qualitative microbiological assays in a single laboratory validation study with a paired design. Objective We compared performance of eight methods including McNemar’s test, sign test, Wilcoxon signed-rank test, paired t-test, and the regression methods based on conditional logistic (CLOGIT), mixed effects complementary log-log (MCLOGLOG), mixed effects logistic (MLOGIT) models, and a linear mixed effects model (LMM). Methods We first compared the minimum detectable difference in the proportion of detections between the alternative and reference detection methods among these statistical methods for a varied number of test portions. We then compared power and type 1 error rates of these methods using simulated data. Results The MCLOGLOG and MLOGIT models had the lowest minimum detectable difference, followed by the LMM and paired t-test. The MCLOGLOG and MLOGIT models had the highest average power but were anticonservative when correlation between the pairs of outcome values of the alternative and reference methods was high. The LMM and paired t-test had mostly the highest average power when the correlation was low and the second highest average power when the correlation was high. Type 1 error rates of these last two methods approached the nominal value of significance level when the number of test portions was moderately large (n > 20). Highlights The LMM and paired t-test are better choices than other competing methods, and we provide an example using real data.

2017 ◽  
Author(s):  
Marie Delacre ◽  
Daniel Lakens ◽  
Christophe Leys

When comparing two independent groups, researchers in Psychology commonly use Student’s t-test. Assumptions of normality and of homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased, and leads to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research and that choosing between Student’s t-test or Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.


2017 ◽  
Author(s):  
Marie Delacre ◽  
Daniel Lakens ◽  
Christophe Leys

When comparing two independent groups, researchers in Psychology commonly use Student’s t-test. Assumptions of normality and of homogeneity of variance underlie this test. More often than not, when these conditions are not met, Student’s t-test can be severely biased, and leads to invalid statistical inferences. Moreover, we argue that the assumption of equal variances will seldom hold in psychological research and that choosing between Student’s t-test or Welch’s t-test based on the outcomes of a test of the equality of variances often fails to provide an appropriate answer. We show that the Welch’s t-test provides a better control of Type 1 error rates when the assumption of homogeneity of variance is not met, and loses little robustness compared to Student’s t-test when the assumptions are met. We argue that Welch’s t-test should be used as a default strategy.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lior Rennert ◽  
Moonseong Heo ◽  
Alain H. Litwin ◽  
Victor De Gruttola

Abstract Background Beginning in 2019, stepped-wedge designs (SWDs) were being used in the investigation of interventions to reduce opioid-related deaths in communities across the United States. However, these interventions are competing with external factors such as newly initiated public policies limiting opioid prescriptions, media awareness campaigns, and the COVID-19 pandemic. Furthermore, control communities may prematurely adopt components of the intervention as they become available. The presence of time-varying external factors that impact study outcomes is a well-known limitation of SWDs; common approaches to adjusting for them make use of a mixed effects modeling framework. However, these models have several shortcomings when external factors differentially impact intervention and control clusters. Methods We discuss limitations of commonly used mixed effects models in the context of proposed SWDs to investigate interventions intended to reduce opioid-related mortality, and propose extensions of these models to address these limitations. We conduct an extensive simulation study of anticipated data from SWD trials targeting the current opioid epidemic in order to examine the performance of these models in the presence of external factors. We consider confounding by time, premature adoption of intervention components, and time-varying effect modification— in which external factors differentially impact intervention and control clusters. Results In the presence of confounding by time, commonly used mixed effects models yield unbiased intervention effect estimates, but can have inflated Type 1 error and result in under coverage of confidence intervals. These models yield biased intervention effect estimates when premature intervention adoption or effect modification are present. In such scenarios, models incorporating fixed intervention-by-time interactions with an unstructured covariance for intervention-by-cluster-by-time random effects result in unbiased intervention effect estimates, reach nominal confidence interval coverage, and preserve Type 1 error. Conclusions Mixed effects models can adjust for different combinations of external factors through correct specification of fixed and random time effects. Since model choice has considerable impact on validity of results and study power, careful consideration must be given to how these external factors impact study endpoints and what estimands are most appropriate in the presence of such factors.


1986 ◽  
Vol 20 (2) ◽  
pp. 189-200 ◽  
Author(s):  
Kevin D. Bird ◽  
Wayne Hall

Statistical power is neglected in much psychiatric research, with the consequence that many studies do not provide a reasonable chance of detecting differences between groups if they exist in the population. This paper attempts to improve current practice by providing an introduction to the essential quantities required for performing a power analysis (sample size, effect size, type 1 and type 2 error rates). We provide simplified tables for estimating the sample size required to detect a specified size of effect with a type 1 error rate of α and a type 2 error rate of β, and for estimating the power provided by a given sample size for detecting a specified size of effect with a type 1 error rate of α. We show how to modify these tables to perform power analyses for multiple comparisons in univariate and some multivariate designs. Power analyses for each of these types of design are illustrated by examples.


2020 ◽  
Author(s):  
Janet Aisbett ◽  
Daniel Lakens ◽  
Kristin Sainani

Magnitude based inference (MBI) was widely adopted by sport science researchers as an alternative to null hypothesis significance tests. It has been criticized for lacking a theoretical framework, mixing Bayesian and frequentist thinking, and encouraging researchers to run small studies with high Type 1 error rates. MBI terminology describes the position of confidence intervals in relation to smallest meaningful effect sizes. We show these positions correspond to combinations of one-sided tests of hypotheses about the presence or absence of meaningful effects, and formally describe MBI as a multiple decision procedure. MBI terminology operates as if tests are conducted at multiple alpha levels. We illustrate how error rates can be controlled by limiting each one-sided hypothesis test to a single alpha level. To provide transparent error control in a Neyman-Pearson framework and encourage the use of standard statistical software, we recommend replacing MBI with one-sided tests against smallest meaningful effects, or pairs of such tests as in equivalence testing. Researchers should pre-specify their hypotheses and alpha levels, perform a priori sample size calculations, and justify all assumptions. Our recommendations show researchers what tests to use and how to design and report their statistical analyses to accord with standard frequentist practice.


2016 ◽  
Vol 27 (7) ◽  
pp. 2200-2215 ◽  
Author(s):  
Masahiko Gosho ◽  
Kazushi Maruo ◽  
Ryota Ishii ◽  
Akihiro Hirakawa

The total score, which is calculated as the sum of scores in multiple items or questions, is repeatedly measured in longitudinal clinical studies. A mixed effects model for repeated measures method is often used to analyze these data; however, if one or more individual items are not measured, the method cannot be directly applied to the total score. We develop two simple and interpretable procedures that infer fixed effects for a longitudinal continuous composite variable. These procedures consider that the items that compose the total score are multivariate longitudinal continuous data and, simultaneously, handle subject-level and item-level missing data. One procedure is based on a multivariate marginalized random effects model with a multiple of Kronecker product covariance matrices for serial time dependence and correlation among items. The other procedure is based on a multiple imputation approach with a multivariate normal model. In terms of the type-1 error rate and the bias of treatment effect in total score, the marginalized random effects model and multiple imputation procedures performed better than the standard mixed effects model for repeated measures analysis with listwise deletion and single imputations for handling item-level missing data. In particular, the mixed effects model for repeated measures with listwise deletion resulted in substantial inflation of the type-1 error rate. The marginalized random effects model and multiple imputation methods provide for a more efficient analysis by fully utilizing the partially available data, compared to the mixed effects model for repeated measures method with listwise deletion.


Sign in / Sign up

Export Citation Format

Share Document