Context-Based Facilitation of Semantic Access Follows Both Logarithmic and Linear Functions of Stimulus Probability
Stimuli are easier to process when the preceding context (e.g., a sentence, in the case of a word) makes them predictable. However, it remains unclear whether context-based facilitation arises due to predictive preactivation of a limited set of relatively probable upcoming stimuli (with facilitation then linearly related to probability) or, instead, arises because the system maintains and updates a probability distribution across all items, as posited by accounts (e.g., surprisal theory) assuming a logarithmic function between predictability and processing effort. To adjudicate between these accounts, we measured the N400 component, an index of semantic access, evoked by sentence-final words of varying probability, including unpredictable words, which are never generated in human production norms. Word predictability was measured using both cloze probabilities and a state-of-the-art machine learning language model (GPT-2). We reanalyzed five datasets (n=138) to first demonstrate and then replicate that context-based facilitation on the N400 is graded and dissociates even among words with cloze probabilities at or near 0, as a function of very small differences in model-estimated predictability. Furthermore, we established that the relationship between word predictability and context-based facilitation on the N400 is neither purely linear nor purely logarithmic but instead combines both functions. We argue that such a composite function reveals properties of the mapping between words and semantic features and how feature- and word- related information is activated during on-line processing. Overall, the results provide powerful evidence for the role of internal models in shaping how the brain apprehends incoming stimulus information.