scholarly journals Semantic Norm Extrapolation is a Missing Data Problem

2020 ◽  
Author(s):  
Bryor Snefjella ◽  
Idan Blank

For close to 70 years psychologists have studied word meaning using a simple method: participants rate words on some theoretically motivated property (e.g. pleasantness, familiarity) using a Likert scale as the measurement instrument. Such semantic judgments serve as a means of interrogating the underlying structure of lexical semantic constructs, to select stimuli for experiments, or as covariates in models predicting brain or behaviour. Recently, there has been a surge of interest in using computational distributional semantic word representations and supervised learning to predict semantic judgments on Likert scales for words lacking empirical measurements. We call this task semantic norm extrapolation. A significant body of work has developed showing methods for semantic norm extrapolation are often highly accurate. The impressive performance of models for this task may give the appearance that non-empirical, machine learning derived estimates of semantic norms are interchangeable with empirical measurements of semantic norms. Herein, we argue that this is not the case, and that all extant methods for semantic norm extrapolation are more problematic than the literature suggests. Naive use of extrapolated semantic norms should be expected to yield biased and anti-conservative analyses. We make this argument using a mixture of 1) the principles of analysis of partially observed data, 2) simulations, and 3) a real-data example. Achieving sound inference when using semantic norm extrapolation requires a conceptual and methodological shift from treating semantic norm extrapolation as a prediction problem to treating it as a missing data problem. This shift in perspective also lays bare problems in default analytical procedures of semantic norms and megastudy data, and surprisingly suggests that semantic norm extrapolation --- when done using recommended procedures for analysis of partially observed data --- should be default methodological practice.

2020 ◽  
Vol 20 (23) ◽  
pp. 13984-13998
Author(s):  
Jinghan Du ◽  
Minghua Hu ◽  
Weining Zhang

Bad Science ◽  
2020 ◽  
pp. 135-138
Author(s):  
Florian Meinfelder ◽  
Rebekka Kluge

2012 ◽  
Vol 40 (2) ◽  
pp. 282-303 ◽  
Author(s):  
Jieli Ding ◽  
Yanyan Liu ◽  
David B. Peden ◽  
Steven R. Kleeberger ◽  
Haibo Zhou

Sign in / Sign up

Export Citation Format

Share Document