reward prediction
Recently Published Documents


TOTAL DOCUMENTS

364
(FIVE YEARS 126)

H-INDEX

52
(FIVE YEARS 8)

2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Chikara Ishii ◽  
Jun’ichi Katayama

AbstractIn action monitoring, i.e., evaluating an outcome of our behavior, a reward prediction error signal is calculated as the difference between actual and predicted outcomes and is used to adjust future behavior. Previous studies demonstrate that this signal, which is reflected by an event-related brain potential called feedback-related negativity (FRN), occurs in response to not only one's own outcomes, but also those of others. However, it is still unknown if predictions of different actors' performance interact with each other. Thus, we investigated how predictions from one’s own and another’s performance history affect each other by manipulating the task difficulty for participants themselves and their partners independently. Pairs of participants performed a time estimation task, randomly switching the roles of actor and observer from trial to trial. Results show that the history of the other’s performance did not modulate the amplitude of the FRN for the evaluation of one’s own outcomes. In contrast, the amplitude of the observer FRN for the other’s outcomes differed according to the frequency of one’s own action outcomes. In conclusion, the monitoring system tracks the histories of one’s own and observed outcomes separately and considers information related to one’s own action outcomes to be more important.


2022 ◽  
Vol 119 (2) ◽  
pp. e2113311119
Author(s):  
Stefania Sarno ◽  
Manuel Beirán ◽  
Joan Falcó-Roget ◽  
Gabriel Diaz-deLeon ◽  
Román Rossi-Pool ◽  
...  

Little is known about how dopamine (DA) neuron firing rates behave in cognitively demanding decision-making tasks. Here, we investigated midbrain DA activity in monkeys performing a discrimination task in which the animal had to use working memory (WM) to report which of two sequentially applied vibrotactile stimuli had the higher frequency. We found that perception was altered by an internal bias, likely generated by deterioration of the representation of the first frequency during the WM period. This bias greatly controlled the DA phasic response during the two stimulation periods, confirming that DA reward prediction errors reflected stimulus perception. In contrast, tonic dopamine activity during WM was not affected by the bias and did not encode the stored frequency. More interestingly, both delay-period activity and phasic responses before the second stimulus negatively correlated with reaction times of the animals after the trial start cue and thus represented motivated behavior on a trial-by-trial basis. During WM, this motivation signal underwent a ramp-like increase. At the same time, motivation positively correlated with accuracy, especially in difficult trials, probably by decreasing the effect of the bias. Overall, our results indicate that DA activity, in addition to encoding reward prediction errors, could at the same time be involved in motivation and WM. In particular, the ramping activity during the delay period suggests a possible DA role in stabilizing sustained cortical activity, hypothetically by increasing the gain communicated to prefrontal neurons in a motivation-dependent way.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Lorenz Deserno ◽  
Rani Moran ◽  
Jochen Michely ◽  
Ying Lee ◽  
Peter Dayan ◽  
...  

Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlights a novel DA influence on MB-MF cooperative interactions.


2021 ◽  
Author(s):  
◽  
Marie Vanden Broeke

<p>Diminished motivation is a core feature of schizophrenia that has been linked to impaired functional outcomes. A mechanism thought to contribute to diminished motivation is impaired anticipatory pleasure. Impaired anticipatory pleasure is associated with disrupted reward prediction and reduced engagement in reward-seeking behaviours. To investigate the role of the dopamine D₁ receptor in anticipatory pleasure, D₁ mutant rats and WT rats performed five experiments. Reward prediction was examined using the anticipatory locomotion experiment and successive negative contrast experiment. It was found that D₁ mutant rats have impaired anticipatory responses to expected reward. However, as the WT rats did not show the expected response to an alteration in reward expectation, it was impossible to assess the role of the D₁ receptor. Together, these findings suggest that the D₁ receptor may be involved in aspects of reward prediction. Reward-seeking behaviour was examined using the social approach experiment, scent marking experiment, and the separation induced vocalization experiment. It was found that the D₁ mutant rats have an impaired ability to engage in social and sexual reward-seeking behaviours, but have relatively normal ability to engage in maternal reward-seeking behaviours. Together, these findings indicate that the D₁ receptor is involved in certain aspects of reward-seeking behaviours. In conclusion, there is compelling evidence that a D₁ receptor dysfunction is a likely contributor to diminished motivation in schizophrenia.</p>


2021 ◽  
Author(s):  
◽  
Marie Vanden Broeke

<p>Diminished motivation is a core feature of schizophrenia that has been linked to impaired functional outcomes. A mechanism thought to contribute to diminished motivation is impaired anticipatory pleasure. Impaired anticipatory pleasure is associated with disrupted reward prediction and reduced engagement in reward-seeking behaviours. To investigate the role of the dopamine D₁ receptor in anticipatory pleasure, D₁ mutant rats and WT rats performed five experiments. Reward prediction was examined using the anticipatory locomotion experiment and successive negative contrast experiment. It was found that D₁ mutant rats have impaired anticipatory responses to expected reward. However, as the WT rats did not show the expected response to an alteration in reward expectation, it was impossible to assess the role of the D₁ receptor. Together, these findings suggest that the D₁ receptor may be involved in aspects of reward prediction. Reward-seeking behaviour was examined using the social approach experiment, scent marking experiment, and the separation induced vocalization experiment. It was found that the D₁ mutant rats have an impaired ability to engage in social and sexual reward-seeking behaviours, but have relatively normal ability to engage in maternal reward-seeking behaviours. Together, these findings indicate that the D₁ receptor is involved in certain aspects of reward-seeking behaviours. In conclusion, there is compelling evidence that a D₁ receptor dysfunction is a likely contributor to diminished motivation in schizophrenia.</p>


2021 ◽  
Author(s):  
Karel Kieslich ◽  
Vincent Valton ◽  
Jonathan Paul Roiser

In order to develop effective treatments for anhedonia we need to understand its underlying neurobiological mechanisms. Anhedonia is conceptually strongly linked to reward processing, which involves a variety of cognitive and neural operations. This article reviews the evidence for impairments in experiencing hedonic response (pleasure), reward valuation, and reward learning based on outcomes (commonly conceptualised in terms of “reward prediction error”). Synthesizing behavioural and neuroimaging findings, we examine case-control studies of patients with depression and schizophrenia, including those focusing specifically on anhedonia. Overall, there is reliable evidence that depression and schizophrenia are associated with disrupted reward processing. In contrast to the historical definition of anhedonia, there is surprisingly limited evidence for impairment in the ability to experience pleasure in depression and schizophrenia. There is some evidence that learning about reward and reward prediction error signals are impaired in depression and schizophrenia, but the literature is inconsistent. The strongest evidence is for impairments in the representation of reward value and how this is used to guide action. Future studies would benefit from focusing on impairments in reward processing specifically in anhedonic samples, including transdiagnostically, and from using designs separating different components of reward processing, formulating them in computational terms, and moving beyond cross-sectional designs to provide an assessment of causality.


2021 ◽  
Vol 15 ◽  
Author(s):  
Arthur Prével ◽  
Ruth M. Krebs

In a new environment, humans and animals can detect and learn that cues predict meaningful outcomes, and use this information to adapt their responses. This process is termed Pavlovian conditioning. Pavlovian conditioning is also observed for stimuli that predict outcome-associated cues; a second type of conditioning is termed higher-order Pavlovian conditioning. In this review, we will focus on higher-order conditioning studies with simultaneous and backward conditioned stimuli. We will examine how the results from these experiments pose a challenge to models of Pavlovian conditioning like the Temporal Difference (TD) models, in which learning is mainly driven by reward prediction errors. Contrasting with this view, the results suggest that humans and animals can form complex representations of the (temporal) structure of the task, and use this information to guide behavior, which seems consistent with model-based reinforcement learning. Future investigations involving these procedures could result in important new insights on the mechanisms that underlie Pavlovian conditioning.


2021 ◽  
pp. 1-31
Author(s):  
Germain Lefebvre ◽  
Christopher Summerfield ◽  
Rafal Bogacz

Abstract Reinforcement learning involves updating estimates of the value of states and actions on the basis of experience. Previous work has shown that in humans, reinforcement learning exhibits a confirmatory bias: when the value of a chosen option is being updated, estimates are revised more radically following positive than negative reward prediction errors, but the converse is observed when updating the unchosen option value estimate. Here, we simulate performance on a multi-arm bandit task to examine the consequences of a confirmatory bias for reward harvesting. We report a paradoxical finding: that confirmatory biases allow the agent to maximize reward relative to an unbiased updating rule. This principle holds over a wide range of experimental settings and is most influential when decisions are corrupted by noise. We show that this occurs because on average, confirmatory biases lead to overestimating the value of more valuable bandits and underestimating the value of less valuable bandits, rendering decisions overall more robust in the face of noise. Our results show how apparently suboptimal learning rules can in fact be reward maximizing if decisions are made with finite computational precision.


2021 ◽  
Author(s):  
Anthony M.V. Jakob ◽  
John G Mikhael ◽  
Allison E Hamilos ◽  
John A Assad ◽  
Samuel J Gershman

The role of dopamine as a reward prediction error signal in reinforcement learning tasks has been well-established over the past decades. Recent work has shown that the reward prediction error interpretation can also account for the effects of dopamine on interval timing by controlling the speed of subjective time. According to this theory, the timing of the dopamine signal relative to reward delivery dictates whether subjective time speeds up or slows down: Early DA signals speed up subjective time and late signals slow it down. To test this bidirectional prediction, we reanalyzed measurements of dopaminergic neurons in the substantia nigra pars compacta of mice performing a self-timed movement task. Using the slope of ramping dopamine activity as a read-out of subjective time speed, we found that trial-by-trial changes in the slope could be predicted from the timing of dopamine activity on the previous trial. This result provides a key piece of evidence supporting a unified computational theory of reinforcement learning and interval timing.


Author(s):  
Mitsuo Kawato ◽  
Aurelio Cortese

AbstractIn several papers published in Biological Cybernetics in the 1980s and 1990s, Kawato and colleagues proposed computational models explaining how internal models are acquired in the cerebellum. These models were later supported by neurophysiological experiments using monkeys and neuroimaging experiments involving humans. These early studies influenced neuroscience from basic, sensory-motor control to higher cognitive functions. One of the most perplexing enigmas related to internal models is to understand the neural mechanisms that enable animals to learn large-dimensional problems with so few trials. Consciousness and metacognition—the ability to monitor one’s own thoughts, may be part of the solution to this enigma. Based on literature reviews of the past 20 years, here we propose a computational neuroscience model of metacognition. The model comprises a modular hierarchical reinforcement-learning architecture of parallel and layered, generative-inverse model pairs. In the prefrontal cortex, a distributed executive network called the “cognitive reality monitoring network” (CRMN) orchestrates conscious involvement of generative-inverse model pairs in perception and action. Based on mismatches between computations by generative and inverse models, as well as reward prediction errors, CRMN computes a “responsibility signal” that gates selection and learning of pairs in perception, action, and reinforcement learning. A high responsibility signal is given to the pairs that best capture the external world, that are competent in movements (small mismatch), and that are capable of reinforcement learning (small reward-prediction error). CRMN selects pairs with higher responsibility signals as objects of metacognition, and consciousness is determined by the entropy of responsibility signals across all pairs. This model could lead to new-generation AI, which exhibits metacognition, consciousness, dimension reduction, selection of modules and corresponding representations, and learning from small samples. It may also lead to the development of a new scientific paradigm that enables the causal study of consciousness by combining CRMN and decoded neurofeedback.


Sign in / Sign up

Export Citation Format

Share Document