scholarly journals Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Colin W. Hoy ◽  
Sheila C. Steiner ◽  
Robert T. Knight

AbstractLearning signals during reinforcement learning and cognitive control rely on valenced reward prediction errors (RPEs) and non-valenced salience prediction errors (PEs) driven by surprise magnitude. A core debate in reward learning focuses on whether valenced and non-valenced PEs can be isolated in the human electroencephalogram (EEG). We combine behavioral modeling and single-trial EEG regression to disentangle sequential PEs in an interval timing task dissociating outcome valence, magnitude, and probability. Multiple regression across temporal, spatial, and frequency dimensions characterized a spatio-tempo-spectral cascade from early valenced RPE value to non-valenced RPE magnitude, followed by outcome probability indexed by a late frontal positivity. Separating negative and positive outcomes revealed the valenced RPE value effect is an artifact of overlap between two non-valenced RPE magnitude responses: frontal theta feedback-related negativity on losses and posterior delta reward positivity on wins. These results reconcile longstanding debates on the sequence of components representing reward and salience PEs in the human EEG.

2020 ◽  
Author(s):  
Colin W. Hoy ◽  
Sheila C. Steiner ◽  
Robert T. Knight

SUMMARYRecent developments in reinforcement learning, cognitive control, and systems neuroscience highlight the complimentary roles in learning of valenced reward prediction errors (RPEs) and non-valenced salience prediction errors (PEs) driven by the magnitude of surprise. A core debate in reward learning focuses on whether valenced and non-valenced PEs can be isolated in the human electroencephalogram (EEG). Here, we combine behavioral modeling and single-trial EEG regression revealing a sequence of valenced and non-valenced PEs in an interval timing task dissociating outcome valence, magnitude, and probability. Multiple regression across temporal, spatial, and frequency dimensions revealed a spatio-tempo-spectral cascade from valenced RPE value represented by the feedback related negativity event-related potential (ERP) followed by non-valenced RPE magnitude and outcome probability effects indexed by subsequent P300 and late frontal positivity ERPs. The results show that learning is supported by a sequence of multiple PEs evident in the human EEG.


2018 ◽  
Vol 44 (suppl_1) ◽  
pp. S281-S282
Author(s):  
Lilian Weber ◽  
Andreea Diaconescu ◽  
Sara Tomiello ◽  
Dario Schöbi ◽  
Sandra Iglesias ◽  
...  

2015 ◽  
Vol 113 (1) ◽  
pp. 200-205 ◽  
Author(s):  
Kenneth T. Kishida ◽  
Ignacio Saez ◽  
Terry Lohrenz ◽  
Mark R. Witcher ◽  
Adrian W. Laxton ◽  
...  

In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson’s disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson’s disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons.


2019 ◽  
Author(s):  
Valérian Chambon ◽  
Héloïse Théro ◽  
Marie Vidal ◽  
Henri Vandendriessche ◽  
Patrick Haggard ◽  
...  

AbstractPositivity bias refers to learning more from positive than negative events. This learning asymmetry could either reflect a preference for positive events in general, or be the upshot of a more general, and perhaps, ubiquitous, “choice-confirmation” bias, whereby agents preferentially integrate information that confirms their previous decision. We systematically compared these two theories with 3 experiments mixing free- and forced-choice conditions, featuring factual and counterfactual learning and varying action requirements across “go” and “no-go” trials. Computational analyses of learning rates showed clear and robust evidence in favour of the “choice-confirmation” theory: participants amplified positive prediction errors in free-choice conditions while being valence-neutral on forced-choice conditions. We suggest that a choice-confirmation bias is adaptive to the extent that it reinforces actions that are most likely to meet an individual’s needs, i.e. freely chosen actions. In contrast, outcomes from unchosen actions are more likely to be treated impartially, i.e. to be assigned no special value in self-determined decisions.


2020 ◽  
Author(s):  
Austin J. Gallyer ◽  
Kreshnik Burani ◽  
Elizabeth M. Mulligan ◽  
Nicholas Santopetro ◽  
Sean P. Dougherty ◽  
...  

AbstractA recent study by Tsypes, Owens, and Gibb (2019) found that children with recent suicidal ideation had blunted neural reward processing, as measured by the reward positivity (RewP), compared to matched controls, and that this difference was driven by reduced neural responses to monetary loss, rather than blunted neural response to monetary reward. Here, we aimed to conceptually replicate and extend these findings in two large samples of children and adolescents (n = 275 and n = 235). Results from our conceptual replication found no evidence that children and adolescents with suicidal ideation have abnormal reward or loss processing. We extended these findings in a longitudinal sample of children and adolescents with two time points and found no evidence that reward- or loss-related ERPs predict changes in suicidal ideation. The results highlight the need for greater statistical power, and continued research examining the neural underpinnings of suicidal thoughts and behaviors.


2020 ◽  
Vol 34 (4) ◽  
pp. 255-267 ◽  
Author(s):  
Carter J. Funkhouser ◽  
Randy P. Auerbach ◽  
Autumn Kujawa ◽  
Sylvia A. Morelli ◽  
K. Luan Phan ◽  
...  

Abstract. Abnormal social or reward processing is associated with several mental disorders. Although most studies examining reward processing have focused on monetary rewards, recent research also has tested neural reactivity to social rewards (e.g., positive social feedback). However, the majority of these studies only include two feedback valences (e.g., acceptance, rejection). Yet, social evaluation is rarely binary (positive vs. negative) and people often give “on the fence” or neutral evaluations of others. Processing of this type of social feedback may be ambiguous and impacted by factors such as psychopathology, self-esteem, and prior experiences of rejection. Thus, the present study probed the reward positivity (RewP), P300, and late positive potential (LPP) following acceptance, rejection, and “on the fence” [between acceptance and rejection] feedback in undergraduate students ( n = 45). Results indicated that the RewP showed more positive amplitudes following acceptance compared to both rejection and “on the fence” feedback, and the RewP was larger (i.e., more positive) following rejection relative to “on the fence” feedback. In contrast, the P300 did not differ between rejection and “on the fence” feedback, and both were reduced compared to acceptance. The LPP was blunted in response to rejection relative to acceptance and “on the fence” feedback (which did not differ from each other). Exploratory analyses demonstrated that greater self-reported rejection sensitivity was associated with a reduced LPP to acceptance. Taken together, these findings suggest that the neural systems underlying the RewP, P300, and LPP may evaluate “on the fence” social feedback differently, and that individuals high on rejection sensitivity may exhibit reduced attention toward and elaborative processing of social acceptance.


2021 ◽  
Author(s):  
Tobias F Marton ◽  
Brian Roach ◽  
Clay B Holroyd ◽  
Judith M Ford ◽  
John McQuaid ◽  
...  

Background: Deficits in the way the brain processes rewards may contribute to negative symptoms in schizophrenia. Synchronization of alpha band neural oscillations is a dominant EEG signal when people are awake, but at rest. In contrast, alpha desynchronization to salient events is thought to direct allocation of information processing resources away from the internal state, to process salient stimuli in the external environment. Here, we hypothesize that alpha event-related desynchronization (ERD) during reward processing is altered in schizophrenia, leading to less difference in alpha ERD magnitude between winning and losing outcomes. Methods: EEG was recorded while participants (patients with schizophrenia (SZ)=54; healthy controls (HC) = 54) completed a casino-style slot machine gambling task. Total power, a measure of neural oscillation magnitude was measured in the alpha frequency range (8-14 Hz), time-locked to reward delivery, extracted via principal components analysis, and then compared between groups and equiprobable win and near miss loss reward outcomes. Associations between alpha power and negative symptoms and trait rumination were examined. Results: A significant Group X Reward Outcome interaction (p=.018) was explained by differences within the HC group, driven by significant posterior-occipital alpha desynchronization to wins, relative to near miss losses (p<.001). In contrast, SZ did not modulate alpha power to wins vs. near miss losses (p>.1), nor did alpha power relate to negative symptoms (p>.1). However, across all participants, less alpha ERD to reward outcomes was related to more trait rumination, for both wins (p=.005) and near-miss losses (p=.002), with no group differences observed in the slopes of these relationships. Conclusion: These findings suggest that event-related modulation of alpha power is altered in schizophrenia during reward outcome processing, even when reward attainment places minimal demands on higher-order cognitive processes during slot machine play. In addition, high trait rumination is associated with less event-related desynchronization to reward feedback, suggesting that rumination covaries with less external attentional allocation to reward processing, regardless of reward outcome valence and group membership.


2014 ◽  
Vol 26 (3) ◽  
pp. 635-644 ◽  
Author(s):  
Olav E. Krigolson ◽  
Cameron D. Hassall ◽  
Todd C. Handy

Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors—discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833–1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129–141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769–776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679–709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.


Sign in / Sign up

Export Citation Format

Share Document