Practice for Conducting Equivalence Tests for Comparing Testing Processes

Abstract Regression discontinuity (RD) designs are increasingly common in political science. They have many advantages, including a known and observable treatment assignment mechanism. The literature has emphasized the need for “falsification tests” and ways to assess the validity of the design. When implementing RD designs, researchers typically rely on two falsification tests, based on empirically testable implications of the identifying assumptions, to argue the design is credible. These tests, one for continuity in the regression function for a pretreatment covariate, and one for continuity in the density of the forcing variable, use a null of no difference in the parameter of interest at the discontinuity. Common practice can, incorrectly, conflate a failure to reject evidence of a flawed design with evidence that the design is credible. The well-known equivalence testing approach addresses these problems, but how to implement equivalence tests in the RD framework is not straightforward. This paper develops two equivalence tests tailored for RD designs that allow researchers to provide statistical evidence that the design is credible. Simulation studies show the superior performance of equivalence-based tests over tests-of-difference, as used in current practice. The tests are applied to the close elections RD data presented in Eggers et al. (2015b).

Download Full-text

Psychometric Equivalence of Standard and Prorated Boston Naming Test Scores

Assessment ◽

10.1177/1073191120983925 ◽

2020 ◽

pp. 107319112098392

Author(s):

Danielle Zimmerman ◽

J. Attridge ◽

Summer Rolin ◽

Jeremy Davis

Keyword(s):

Race And Ethnicity ◽

Test Scores ◽

Concordance Correlation ◽

Physical Medicine ◽

Boston Naming Test ◽

Item Score ◽

Perfect Agreement ◽

Psychometric Equivalence ◽

Equivalence Tests ◽

Naming Test

This study compared prorated Boston Naming Test (BNT-P; omitting the noose item) and standard administration (BNT-S) scores in physical medicine and rehabilitation patients ( N = 480). The sample was 34% female and 91% White with average age and education of 46 ( SD = 15) and 14 ( SD = 3) years, respectively. BNT-P was calculated by summing correct responses excluding item 48 and estimating the 60-item score with cross multiplication and division. BNT-P and BNT-S scores were compared via concordance correlation (CC) coefficients; reflected and log transformed data were examined with equivalence tests. BNT-P and BNT-S scores showed almost perfect agreement (CC = .99). Transformed scores demonstrated equivalence (±1.1 points). Raw and scaled score differences were 0 in 88% and 96% of cases, respectively. Race and ethnicity accounted for item 48 outcomes while controlling for age and education. Findings support the utility of prorated BNT scores in rehabilitation patients.

Download Full-text

Assessing Equivalence Tests with Respect to their Expected p-Value

Biometrical Journal ◽

10.1002/bimj.200290001 ◽

2002 ◽

Vol 44 (8) ◽

pp. 1015-1027 ◽

Cited By ~ 2

Author(s):

Rafael Pflüger ◽

Torsten Hothorn

Keyword(s):

P Value ◽

Equivalence Tests

Download Full-text

Odor-taste interactions in food perception: Exposure protocol shows no effects of associative learning

Chemical Senses ◽

10.1093/chemse/bjab003 ◽

2021 ◽

Author(s):

Robin Fondberg ◽

Johan N Lundström ◽

Janina Seubert

Keyword(s):

Associative Learning ◽

Food Preferences ◽

Exposure Phase ◽

Experimental Setting ◽

Experimental Conditions ◽

Future Studies ◽

Food Perception ◽

Equivalence Tests ◽

Taste And Odor ◽

Chewing Gums

Abstract Repeated exposure can change the perceptual and hedonic features of flavor. Associative learning during which a flavor’s odor component is affected by co-exposure with taste is thought to be central in this process. However, changes can also arise due to exposure to the odor in itself. The aim of this study was to dissociate effects of associative learning from effects of exposure without taste by repeatedly presenting one odor together with sucrose and a second odor alone. Sixty individuals attended two testing sessions separated by a five-day exposure phase during which the stimuli were presented as flavorants in chewing gums that were chewed three times daily. Ratings of odor sweetness, odor pleasantness, odor intensity enhancement by taste, and odor referral to the mouth were collected at both sessions. Consistent with the notion that food preferences are modulated by exposure, odor pleasantness increased between the sessions independently of whether the odor (basil or orange flower) had been presented with or without sucrose. However, we found no evidence of associative learning in any of the tasks. In addition, exploratory equivalence tests suggested that these effects were either absent or insignificant in magnitude. Taken together, our results suggest that the hypothesized effects of associative learning are either smaller than previously thought or highly dependent on the experimental setting. Future studies are needed to evaluate the relative support for these explanations and, if experimental conditions can be identified that reliably produce such effects, to identify factors that regulate the formation of new odor-taste associations.

Download Full-text

Equivalence tests for two unrelated samples

Testing Statistical Hypotheses of Equivalence and Noninferiority, Second Edition ◽

10.1201/ebk1439808184-c6 ◽

2010 ◽

pp. 119-218

Keyword(s):

Equivalence Tests

Download Full-text

Equivalence tests for selected one-parameter problems

Testing Statistical Hypotheses of Equivalence and Noninferiority, Second Edition ◽

10.1201/ebk1439808184-c4 ◽

2010 ◽

pp. 49-70

Keyword(s):

Equivalence Tests

Download Full-text

Detection of eye contact with deep neural networks is as accurate as human experts

10.31219/osf.io/5a6m7 ◽

2020 ◽

Author(s):

Eunji Chong ◽

Elysha Clark-Whitney ◽

Audrey Southerland ◽

Elizabeth Stubbs ◽

Chanel Miller ◽

...

Keyword(s):

Neural Network ◽

Eye Contact ◽

Autism Spectrum ◽

Point Of View ◽

Gaze Behavior ◽

Equivalence Tests ◽

Social Settings ◽

Automated Coding ◽

Primary Means ◽

And Performance

Eye contact is among the most primary means of social communication that humans use from the first months of life. Quantification of eye contact is valuable in various scenarios as a part of the analysis of social roles, communication skills, and medical screening. Estimating a subject's looking direction from video is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint as a result of its configuration. While moments of eye contact from this viewpoint can be hand coded, such process tends to be laborious and subjective. In this work, we developed the first deep neural network model to automatically detect eye contact in egocentric video with accuracy equivalent to that of human experts. We trained a deep convolutional neural network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision 0.936 and recall 0.943 on 18 set-aside validation subjects, and performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. This result passes class equivalence tests in Cohen’s kappa scores (equivalence boundary of 0.025, p < .005), demonstrating that deep learning model can produce automated coding with a level of reliability comparable to human coders. The presented method will be instrumental in analyzing gaze behavior in naturalistic social settings by serving as a scalable, objective, and accessible tool for clinicians and researchers.

Download Full-text

Playing smartphone games while studying: an experimental study on reading interruptions by a smartphone game

Education and Information Technologies ◽

10.1007/s10639-021-10764-0 ◽

2021 ◽

Author(s):

Katharina Graben ◽

Bettina K. Doering ◽

Antonia Barke

Keyword(s):

Reading Speed ◽

Internal Validity ◽

The Other ◽

Learning Performance ◽

Future Research ◽

Control Group ◽

Specific Exercise ◽

Equivalence Tests ◽

Learning Conditions ◽

High Internal Validity

AbstractIn this study, we investigated whether the use of smartphone games while reading a text reduces learning performance or reading speed. We also examined whether this is affected by push notifications. Ninety-three students were randomly assigned to three learning conditions. In the gaming group (G), participants played a game app for 20 s at 2-min intervals while reading. In one subgroup, the game app sent push notifications (GN+); in the other subgroup, no notifications (GN−) were sent. In the control group (C), participants did not play a game. After the reading, participants took a multiple-choice quiz. We compared quiz scores and reading times of the groups (G) and (C) and within the gaming group (GN+, GN−) and observed no differences. Since the statistical non-significance of these tests does not entail the absence of an effect, we conducted equivalence tests, which did not demonstrate equivalence either. The experiment ensured high internal validity, yet remained inconclusive. Reasons for the similarity of performance in all groups could be non-specific exercise effects (all participants owned a smartphone), low similarity between the tasks, low variance of participants’ ability and motivation (high achieving, low ADHD scores) or low game complexity. Future research should address these questions.

Download Full-text

Links between problem gambling and spending on booster packs in collectible card games: A conceptual replication of research on loot boxes

PLoS ONE ◽

10.1371/journal.pone.0247855 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0247855

Author(s):

David Zendle ◽

Lukasz Walasek ◽

Paul Cairns ◽

Rachel Meyer ◽

Aaron Drummond

Keyword(s):

Problem Gambling ◽

Real World ◽

Practical Importance ◽

Structural Features ◽

Gambling Problems ◽

Card Game ◽

Card Games ◽

Equivalence Tests ◽

Online Stores ◽

Collectible Card Games

Loot boxes are digital containers of randomised rewards present in some video games which are often purchasable for real world money. Recently, concerns have been raised that loot boxes might approximate traditional gambling activities, and that people with gambling problems have been shown to spend more on loot boxes than peers without gambling problems. Some argue that the regulation of loot boxes as gambling-like mechanics is inappropriate because similar activities which also bear striking similarities to traditional forms of gambling, such as collectable card games, are not subject to such regulations. Players of collectible card games often buy sealed physical packs of cards, and these ‘booster packs’ share many formal similarities with loot boxes. However, not everything which appears similar to gambling requires regulation. Here, in a large sample of collectible card game players (n = 726), we show no statistically significant link between in real-world store spending on physical booster and problem gambling (p = 0.110, η2 = 0.004), and a trivial in magnitude relationship between spending on booster packs in online stores and problem gambling (p = 0.035, η2 = 0.008). Follow-up equivalence tests using the TOST procedure rejected the hypothesis that either of these effects was of practical importance (η2 > 0.04). Thus, although collectable card game booster packs, like loot boxes, share structural similarities with gambling, it appears that they may not be linked to problem gambling in the same way as loot boxes. We discuss potential reasons for these differences. Decisions regarding regulation of activities which share structural features with traditional forms of gambling should be made on the basis of definitional criteria as well as whether people with gambling problems purchase such items at a higher rate than peers with no gambling problems. Our research suggests that there is currently little evidence to support the regulation of collectable card games.

Download Full-text

Model equivalence tests in a parametric framework

Journal of Econometrics ◽

10.1016/j.jeconom.2013.05.007 ◽

2014 ◽

Vol 178 ◽

pp. 414-425 ◽

Cited By ~ 2

Author(s):

Pascal Lavergne

Keyword(s):

Equivalence Tests ◽

Model Equivalence

Download Full-text