equivalence testing
Recently Published Documents


TOTAL DOCUMENTS

231
(FIVE YEARS 78)

H-INDEX

20
(FIVE YEARS 2)

Author(s):  
Myles W. O'Brien ◽  
Derek S. Kimmerly

The number of research studies investigating whether similar or different cardiovascular responses or adaptations exist between males and females are increasing. Traditionally, difference-based statistical methods (e.g., t-test, ANOVA, etc.) have been implemented to compare cardiovascular function between males and females, with a P-value >0.05 used to denote similarity between sexes. However, an absence of evidence (i.e., large P-value) is not evidence of absence (i.e., no sex differences). Equivalence testing determines whether two measures or groups provide statistically equivalent outcomes, in that they differ by less than an 'ideally prespecified' smallest effect size of interest. Our perspective discusses the applicability and utility of integrating equivalence testing when conducting sex comparisons in cardiovascular research. An emphasis is placed on how cardiovascular researchers may conduct equivalence testing across multiple study designs (e.g., cross-sectional comparisons, repeated measures intervention, etc.). The strengths and weaknesses of this statistical tool are discussed. Equivalence analyses are relatively simple to conduct, may be used in conjunction with traditional hypothesis testing to interpret findings, and permits the determination of statistically equivalent responses between sexes. We recommend that cardiovascular researchers consider implementing equivalence testing to better our understanding of similar and different cardiovascular processes between sexes.


Assessment ◽  
2021 ◽  
pp. 107319112110625
Author(s):  
Tom H. Rosenström ◽  
Ville Ritola ◽  
Suoma Saarni ◽  
Grigori Joffe ◽  
Jan-Henry Stenberg

Assessment of treatment response in psychotherapies can be undermined by lack of longitudinal measurement invariance (LMI) in symptom self-report inventories, by measurement error, and/or by wrong model assumptions. To understand and compare these threats to validity of outcome assessment in psychotherapy research, we studied LMI, sum scores, and Davidian Curve Item Response Theory models in a naturalistic guided internet psychotherapy treatment register of 2,218 generalized anxiety disorder (GAD) patients and 3,922 depressive disorder (DD) patients (aged ≥16 years). Symptoms were repeatedly assessed by Generalized Anxiety Disorder Assessment-7 (GAD-7) or Beck Depression Inventory. The symptom self-reports adhered to LMI under equivalence testing, suggesting sum scores are reasonable proxies for disorder status. However, the standard LMI assumption of normally distributed latent factors did not hold and inflated treatment response estimates by 0.2 to 0.3 standard deviation units compared with sum scores. Further methodological research on non-normally distributed latent constructs holds promise in advancing LMI and mental health assessment.


2021 ◽  
Vol 3 ◽  
Author(s):  
Jaak Billiet ◽  
Cecil Meeusen ◽  
Koen Abts

This article examines the relationship between (sub)national identity and attitudes towards immigrants in the multinational context of Belgium. We extend our previous studies by analysing a longer time period (1995–2020) and by making a strong case for the idea that measurement invariance testing and theoretical meaningfulness are closely intertwined. To examine whether and how the relationship between (sub)national identity and perceived ethnic threat has changed over time and between regions, we first test for metric invariance of the latent concepts. Using data from the Belgian National Election Studies, we illustrate that evaluating invariance of measurements is a necessary condition for comparative research, but also that measurement equivalence testing should be considered as an empirical guide showing researchers where substantial conclusions should potentially be revisited and theoretical validity rethought. Next, we verify whether the relationship between (sub)national identity and perceptions of ethnic threat across subnational units can be attributed to different conceptions of community membership -in terms of ethnic and/or civic citizenship conceptions- in Flanders and Wallonia. While we expected that a strong identification with Flanders would primarily be related to an ethnic citizenship representation, and as a result, stronger feelings of threat towards immigrants; we expected that a strong identification with Wallonia would primarily be related to a civic representation of the nation and therefore lower feelings of threat. Thanks to our thorough invariance testing strategy, the conceptualisation and measurement of (sub)national identity had to be adjusted in Wallonia, and the hypotheses had to be qualified.


Author(s):  
Josimara Tatiane da Silva ◽  
Juliana Cobre ◽  
Mário de Castro

2021 ◽  
Vol 5 ◽  
Author(s):  
Harlan Campbell ◽  
Paul Gustafson

In order to determine whether or not an effect is absent based on a statistical test, the recommended frequentist tool is the equivalence test. Typically, it is expected that an appropriate equivalence margin has been specified before any data are observed. Unfortunately, this can be a difficult task. If the margin is too small, then the test's power will be substantially reduced. If the margin is too large, any claims of equivalence will be meaningless. Moreover, it remains unclear how defining the margin afterwards will bias one's results. In this short article, we consider a series of hypothetical scenarios in which the margin is defined post-hoc or is otherwise considered controversial. We also review a number of relevant, potentially problematic actual studies from the clinical trials research, with the aim of motivating a critical discussion as to what is acceptable and desirable in the reporting and interpretation of equivalence tests.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Riko Kelter

Abstract Background Null hypothesis significance testing (NHST) is among the most frequently employed methods in the biomedical sciences. However, the problems of NHST and p-values have been discussed widely and various Bayesian alternatives have been proposed. Some proposals focus on equivalence testing, which aims at testing an interval hypothesis instead of a precise hypothesis. An interval hypothesis includes a small range of parameter values instead of a single null value and the idea goes back to Hodges and Lehmann. As researchers can always expect to observe some (although often negligibly small) effect size, interval hypotheses are more realistic for biomedical research. However, the selection of an equivalence region (the interval boundaries) often seems arbitrary and several Bayesian approaches to equivalence testing coexist. Methods A new proposal is made how to determine the equivalence region for Bayesian equivalence tests based on objective criteria like type I error rate and power. Existing approaches to Bayesian equivalence testing in the two-sample setting are discussed with a focus on the Bayes factor and the region of practical equivalence (ROPE). A simulation study derives the necessary results to make use of the new method in the two-sample setting, which is among the most frequently carried out procedures in biomedical research. Results Bayesian Hodges-Lehmann tests for statistical equivalence differ in their sensitivity to the prior modeling, power, and the associated type I error rates. The relationship between type I error rates, power and sample sizes for existing Bayesian equivalence tests is identified in the two-sample setting. Results allow to determine the equivalence region based on the new method by incorporating such objective criteria. Importantly, results show that not only can prior selection influence the type I error rate and power, but the relationship is even reverse for the Bayes factor and ROPE based equivalence tests. Conclusion Based on the results, researchers can select between the existing Bayesian Hodges-Lehmann tests for statistical equivalence and determine the equivalence region based on objective criteria, thus improving the reproducibility of biomedical research.


2021 ◽  
Vol 5 (3) ◽  
pp. 1-20
Author(s):  
Hamza Bourbouh ◽  
Pierre-Loïc Garoche ◽  
Christophe Garion ◽  
Xavier Thirioux

Model-based design is now unavoidable when building embedded systems and, more specifically, controllers. Among the available model languages, the synchronous dataflow paradigm, as implemented in languages such as MATLAB Simulink or ANSYS SCADE, has become predominant in critical embedded system industries. Both of these frameworks are used to design the controller itself but also provide code generation means, enabling faster deployment to target and easier V&V activities performed earlier in the design process, at the model level. Synchronous models also ease the definition of formal specification through the use of synchronous observers, attaching requirements to the model in the very same language, mastered by engineers and tooled with simulation means or code generation. However, few works address the automatic synthesis of MATLAB Simulink annotations from lower-level models or code. This article presents a compilation process from Lustre models to genuine MATLAB Simulink, without the need to rely on external C functions or MATLAB functions. This translation is based on the modular compilation of Lustre to imperative code and preserves the hierarchy of the input Lustre model within the generated Simulink one. We implemented the approach and used it to validate a compilation toolchain, mapping Simulink to Lustre and then C, thanks to equivalence testing and checking. This backward compilation from Lustre to Simulink also provides the ability to produce automatically Simulink components modeling specification, proof arguments, or test cases coverage criteria.


Author(s):  
Chiara Gattoni ◽  
Barry Vincent O’Neill ◽  
Cantor Tarperi ◽  
Federico Schena ◽  
Samuele Maria Marcora

Abstract Purpose It is well established that mental fatigue impairs performance during lab-based endurance tests lasting less than 45 min. However, the effects of mental fatigue on longer duration endurance events and in field settings are unknown. The aim of this study was to investigate the effect of mental fatigue on performance during a half-marathon race. Methods Forty-six male amateur runners (means ± SD: age 43.8 ± 8.6 years, $$\dot V{O_{2peak}}$$ V ˙ O 2 p e a k 46.0 ± 4.1 ml/kg/min) completed a half-marathon after being randomly allocated to performing a 50-min mentally fatiguing task (mental fatigue group) or reading magazines for 50 min (control group). Running speed, heart rate, and perceived effort were measured during the race. Results Runners in the mental fatigue group completed the half-marathon approximately 4 min slower (106.2 ± 12.4 min) than those in the control group (102.4 ± 10.2 min), but this difference was not statistically significant (Cohen’s d = 0.333; p = 0.265). However, equivalence was not established [t(40.88) = 0.239, p = 0.594] and equivalence testing analysis excluded a beneficial effect of mental fatigue on half-marathon performance. Conclusion Due to its posttest-only design and the achievable sample size, the study did not have enough power to provide evidence that the observed 4-min increase in half-marathon time is statistically significant. However, equivalence testing suggests that mental fatigue has no beneficial effect on half-marathon performance in male amateur runners, and a harmful effect cannot be excluded. Overall, it seems prudent for endurance athletes to avoid mentally fatiguing tasks before competitions.


2021 ◽  
Author(s):  
Udi Alter ◽  
Alyssa Counsell

AbstractPsychological research is rife with inappropriately concluding lack of association or no effect between a predictor and the outcome in regression models following statistically nonsignificant results. This approach is methodologically flawed, however, because failing to reject the null hypothesis using traditional, difference-based tests does not mean the null is true (i.e., no relationship). This flawed methodology leads to high rates of incorrect conclusions that flood the literature. This thesis introduces a novel, methodologically sound alternative. I demonstrate how equivalence testing can be applied to evaluate whether a predictor has negligible effects on the outcome variable in multiple regression. I constructed a simulation study to evaluate the performance (i.e., power and error rates) of two equivalence-based tests and compared it to the common, but inappropriate, method of concluding no effect by failing to reject the null hypothesis of the traditional test. I further propose two R functions to accompany this thesis and supply researchers with open-access and easy-to-use tools that they can flexibly adopt in their own research. The use of the proposed equivalence-based methods and R functions is then illustrated using examples from the literature, and recommendations for results reporting and interpretations are discussed. My results demonstrate that using tests of equivalence instead of the traditional test is the appropriate statistical choice: Tests of equivalence show high rates of correct conclusions, especially with larger sample sizes, and low rates of incorrect conclusions, whereas the traditional method demonstrates unacceptably high incorrect conclusion rates.


Sign in / Sign up

Export Citation Format

Share Document