Practice for Conducting Equivalence Tests for Comparing Testing Processes

2021 ◽  
Author(s):  
Keyword(s):  
2020 ◽  
pp. 1-17
Author(s):  
Erin Hartman

Abstract Regression discontinuity (RD) designs are increasingly common in political science. They have many advantages, including a known and observable treatment assignment mechanism. The literature has emphasized the need for “falsification tests” and ways to assess the validity of the design. When implementing RD designs, researchers typically rely on two falsification tests, based on empirically testable implications of the identifying assumptions, to argue the design is credible. These tests, one for continuity in the regression function for a pretreatment covariate, and one for continuity in the density of the forcing variable, use a null of no difference in the parameter of interest at the discontinuity. Common practice can, incorrectly, conflate a failure to reject evidence of a flawed design with evidence that the design is credible. The well-known equivalence testing approach addresses these problems, but how to implement equivalence tests in the RD framework is not straightforward. This paper develops two equivalence tests tailored for RD designs that allow researchers to provide statistical evidence that the design is credible. Simulation studies show the superior performance of equivalence-based tests over tests-of-difference, as used in current practice. The tests are applied to the close elections RD data presented in Eggers et al. (2015b).


Assessment ◽  
2020 ◽  
pp. 107319112098392
Author(s):  
Danielle Zimmerman ◽  
J. Attridge ◽  
Summer Rolin ◽  
Jeremy Davis

This study compared prorated Boston Naming Test (BNT-P; omitting the noose item) and standard administration (BNT-S) scores in physical medicine and rehabilitation patients ( N = 480). The sample was 34% female and 91% White with average age and education of 46 ( SD = 15) and 14 ( SD = 3) years, respectively. BNT-P was calculated by summing correct responses excluding item 48 and estimating the 60-item score with cross multiplication and division. BNT-P and BNT-S scores were compared via concordance correlation (CC) coefficients; reflected and log transformed data were examined with equivalence tests. BNT-P and BNT-S scores showed almost perfect agreement (CC = .99). Transformed scores demonstrated equivalence (±1.1 points). Raw and scaled score differences were 0 in 88% and 96% of cases, respectively. Race and ethnicity accounted for item 48 outcomes while controlling for age and education. Findings support the utility of prorated BNT scores in rehabilitation patients.


2002 ◽  
Vol 44 (8) ◽  
pp. 1015-1027 ◽  
Author(s):  
Rafael Pflüger ◽  
Torsten Hothorn
Keyword(s):  
P Value ◽  

2021 ◽  
Author(s):  
Robin Fondberg ◽  
Johan N Lundström ◽  
Janina Seubert

Abstract Repeated exposure can change the perceptual and hedonic features of flavor. Associative learning during which a flavor’s odor component is affected by co-exposure with taste is thought to be central in this process. However, changes can also arise due to exposure to the odor in itself. The aim of this study was to dissociate effects of associative learning from effects of exposure without taste by repeatedly presenting one odor together with sucrose and a second odor alone. Sixty individuals attended two testing sessions separated by a five-day exposure phase during which the stimuli were presented as flavorants in chewing gums that were chewed three times daily. Ratings of odor sweetness, odor pleasantness, odor intensity enhancement by taste, and odor referral to the mouth were collected at both sessions. Consistent with the notion that food preferences are modulated by exposure, odor pleasantness increased between the sessions independently of whether the odor (basil or orange flower) had been presented with or without sucrose. However, we found no evidence of associative learning in any of the tasks. In addition, exploratory equivalence tests suggested that these effects were either absent or insignificant in magnitude. Taken together, our results suggest that the hypothesized effects of associative learning are either smaller than previously thought or highly dependent on the experimental setting. Future studies are needed to evaluate the relative support for these explanations and, if experimental conditions can be identified that reliably produce such effects, to identify factors that regulate the formation of new odor-taste associations.


2020 ◽  
Author(s):  
Eunji Chong ◽  
Elysha Clark-Whitney ◽  
Audrey Southerland ◽  
Elizabeth Stubbs ◽  
Chanel Miller ◽  
...  

Eye contact is among the most primary means of social communication that humans use from the first months of life. Quantification of eye contact is valuable in various scenarios as a part of the analysis of social roles, communication skills, and medical screening. Estimating a subject's looking direction from video is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint as a result of its configuration. While moments of eye contact from this viewpoint can be hand coded, such process tends to be laborious and subjective. In this work, we developed the first deep neural network model to automatically detect eye contact in egocentric video with accuracy equivalent to that of human experts. We trained a deep convolutional neural network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision 0.936 and recall 0.943 on 18 set-aside validation subjects, and performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. This result passes class equivalence tests in Cohen’s kappa scores (equivalence boundary of 0.025, p < .005), demonstrating that deep learning model can produce automated coding with a level of reliability comparable to human coders. The presented method will be instrumental in analyzing gaze behavior in naturalistic social settings by serving as a scalable, objective, and accessible tool for clinicians and researchers.


Author(s):  
Katharina Graben ◽  
Bettina K. Doering ◽  
Antonia Barke

AbstractIn this study, we investigated whether the use of smartphone games while reading a text reduces learning performance or reading speed. We also examined whether this is affected by push notifications. Ninety-three students were randomly assigned to three learning conditions. In the gaming group (G), participants played a game app for 20 s at 2-min intervals while reading. In one subgroup, the game app sent push notifications (GN+); in the other subgroup, no notifications (GN−) were sent. In the control group (C), participants did not play a game. After the reading, participants took a multiple-choice quiz. We compared quiz scores and reading times of the groups (G) and (C) and within the gaming group (GN+, GN−) and observed no differences. Since the statistical non-significance of these tests does not entail the absence of an effect, we conducted equivalence tests, which did not demonstrate equivalence either. The experiment ensured high internal validity, yet remained inconclusive. Reasons for the similarity of performance in all groups could be non-specific exercise effects (all participants owned a smartphone), low similarity between the tasks, low variance of participants’ ability and motivation (high achieving, low ADHD scores) or low game complexity. Future research should address these questions.


PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0247855
Author(s):  
David Zendle ◽  
Lukasz Walasek ◽  
Paul Cairns ◽  
Rachel Meyer ◽  
Aaron Drummond

Loot boxes are digital containers of randomised rewards present in some video games which are often purchasable for real world money. Recently, concerns have been raised that loot boxes might approximate traditional gambling activities, and that people with gambling problems have been shown to spend more on loot boxes than peers without gambling problems. Some argue that the regulation of loot boxes as gambling-like mechanics is inappropriate because similar activities which also bear striking similarities to traditional forms of gambling, such as collectable card games, are not subject to such regulations. Players of collectible card games often buy sealed physical packs of cards, and these ‘booster packs’ share many formal similarities with loot boxes. However, not everything which appears similar to gambling requires regulation. Here, in a large sample of collectible card game players (n = 726), we show no statistically significant link between in real-world store spending on physical booster and problem gambling (p = 0.110, η2 = 0.004), and a trivial in magnitude relationship between spending on booster packs in online stores and problem gambling (p = 0.035, η2 = 0.008). Follow-up equivalence tests using the TOST procedure rejected the hypothesis that either of these effects was of practical importance (η2 > 0.04). Thus, although collectable card game booster packs, like loot boxes, share structural similarities with gambling, it appears that they may not be linked to problem gambling in the same way as loot boxes. We discuss potential reasons for these differences. Decisions regarding regulation of activities which share structural features with traditional forms of gambling should be made on the basis of definitional criteria as well as whether people with gambling problems purchase such items at a higher rate than peers with no gambling problems. Our research suggests that there is currently little evidence to support the regulation of collectable card games.


Sign in / Sign up

Export Citation Format

Share Document