scholarly journals Neuronal Reward and Decision Signals: From Theories to Data

2015 ◽  
Vol 95 (3) ◽  
pp. 853-951 ◽  
Author(s):  
Wolfram Schultz

Rewards are crucial objects that induce learning, approach behavior, choices, and emotions. Whereas emotions are difficult to investigate in animals, the learning function is mediated by neuronal reward prediction error signals which implement basic constructs of reinforcement learning theory. These signals are found in dopamine neurons, which emit a global reward signal to striatum and frontal cortex, and in specific neurons in striatum, amygdala, and frontal cortex projecting to select neuronal populations. The approach and choice functions involve subjective value, which is objectively assessed by behavioral choices eliciting internal, subjective reward preferences. Utility is the formal mathematical characterization of subjective value and a prime decision variable in economic choice theory. It is coded as utility prediction error by phasic dopamine responses. Utility can incorporate various influences, including risk, delay, effort, and social interaction. Appropriate for formal decision mechanisms, rewards are coded as object value, action value, difference value, and chosen value by specific neurons. Although all reward, reinforcement, and decision variables are theoretical constructs, their neuronal signals constitute measurable physical implementations and as such confirm the validity of these concepts. The neuronal reward signals provide guidance for behavior while constraining the free will to act.

eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Armin Lak ◽  
William R Stauffer ◽  
Wolfram Schultz

Economic theories posit reward probability as one of the factors defining reward value. Individuals learn the value of cues that predict probabilistic rewards from experienced reward frequencies. Building on the notion that responses of dopamine neurons increase with reward probability and expected value, we asked how dopamine neurons in monkeys acquire this value signal that may represent an economic decision variable. We found in a Pavlovian learning task that reward probability-dependent value signals arose from experienced reward frequencies. We then assessed neuronal response acquisition during choices among probabilistic rewards. Here, dopamine responses became sensitive to the value of both chosen and unchosen options. Both experiments showed also the novelty responses of dopamine neurones that decreased as learning advanced. These results show that dopamine neurons acquire predictive value signals from the frequency of experienced rewards. This flexible and fast signal reflects a specific decision variable and could update neuronal decision mechanisms.


2017 ◽  
Vol 114 (52) ◽  
pp. E11303-E11312 ◽  
Author(s):  
Scott A. Schelp ◽  
Katherine J. Pultorak ◽  
Dylan R. Rakowski ◽  
Devan M. Gomez ◽  
Gregory Krzystyniak ◽  
...  

The mesolimbic dopamine system is strongly implicated in motivational processes. Currently accepted theories suggest that transient mesolimbic dopamine release events energize reward seeking and encode reward value. During the pursuit of reward, critical associations are formed between the reward and cues that predict its availability. Conditioned by these experiences, dopamine neurons begin to fire upon the earliest presentation of a cue, and again at the receipt of reward. The resulting dopamine concentration scales proportionally to the value of the reward. In this study, we used a behavioral economics approach to quantify how transient dopamine release events scale with price and causally alter price sensitivity. We presented sucrose to rats across a range of prices and modeled the resulting demand curves to estimate price sensitivity. Using fast-scan cyclic voltammetry, we determined that the concentration of accumbal dopamine time-locked to cue presentation decreased with price. These data confirm and extend the notion that dopamine release events originating in the ventral tegmental area encode subjective value. Using optogenetics to augment dopamine concentration, we found that enhancing dopamine release at cue made demand more sensitive to price and decreased dopamine concentration at reward delivery. From these observations, we infer that value is decreased because of a negative reward prediction error (i.e., the animal receives less than expected). Conversely, enhancing dopamine at reward made demand less sensitive to price. We attribute this finding to a positive reward prediction error, whereby the animal perceives they received a better value than anticipated.


2020 ◽  
Author(s):  
Pramod Kaushik ◽  
Jérémie Naudé ◽  
Surampudi Bapi Raju ◽  
Frédéric Alexandre

AbstractClassical Conditioning is a fundamental learning mechanism where the Ventral Striatum is generally thought to be the source of inhibition to Ventral Tegmental Area (VTA) Dopamine neurons when a reward is expected. However, recent evidences point to a new candidate in VTA GABA encoding expectation for computing the reward prediction error in the VTA. In this system-level computational model, the VTA GABA signal is hypothesised to be a combination of magnitude and timing computed in the Peduncolopontine and Ventral Striatum respectively. This dissociation enables the model to explain recent results wherein Ventral Striatum lesions affected the temporal expectation of the reward but the magnitude of the reward was intact. This model also exhibits other features in classical conditioning namely, progressively decreasing firing for early rewards closer to the actual reward, twin peaks of VTA dopamine during training and cancellation of US dopamine after training.


2018 ◽  
Author(s):  
Kremer Yves ◽  
Flakowski Jérôme ◽  
Rohner Clément ◽  
Lüscher Christian

AbstractDopamine (DA) neurons of the ventral tegmental area (VTA) track external cues and rewards to generate a reward prediction error (RPE) signal during Pavlovian conditioning. Here we explored how RPE is implemented for a self-paced, operant task in freely moving mice. The animal could trigger a reward-predicting cue by remaining in a specific location of an operant box for a brief time before moving to a spout for reward collection. In vivo single-unit recordings revealed phasic responses to the cue and reward in correct trials, while with failures the activity paused, reflecting positive and negative error signals of a reward prediction. In addition, a majority of VTA DA neurons also encoded parameters of the goal-directed action (e.g. movement velocity, acceleration, distance to goal and licking) by changes in tonic firing rate. Such multiplexing of individual neurons was only apparent while the mouse was engaged in the task. We conclude that a multiplexed internal representation during the task modulates VTA DA neuron activity, indicating a multimodal prediction error that shapes behavioral adaptation of a self-paced goal-directed action.


2009 ◽  
Vol 102 (6) ◽  
pp. 3384-3391 ◽  
Author(s):  
Vivian V. Valentin ◽  
John P. O'Doherty

Prediction error signals have been reported in human imaging studies in target areas of dopamine neurons such as ventral and dorsal striatum during learning with many different types of reinforcers. However, a key question that has yet to be addressed is whether prediction error signals recruit distinct or overlapping regions of striatum and elsewhere during learning with different types of reward. To address this, we scanned 17 healthy subjects with functional magnetic resonance imaging while they chose actions to obtain either a pleasant juice reward (1 ml apple juice), or a monetary gain (5 cents) and applied a computational reinforcement learning model to subjects' behavioral and imaging data. Evidence for an overlapping prediction error signal during learning with juice and money rewards was found in a region of dorsal striatum (caudate nucleus), while prediction error signals in a subregion of ventral striatum were significantly stronger during learning with money but not juice reward. These results provide evidence for partially overlapping reward prediction signals for different types of appetitive reinforcers within the striatum, a finding with important implications for understanding the nature of associative encoding in the striatum as a function of reinforcer type.


2017 ◽  
Vol 114 (48) ◽  
pp. 12696-12701 ◽  
Author(s):  
Mel W. Khaw ◽  
Paul W. Glimcher ◽  
Kenway Louie

The notion of subjective value is central to choice theories in ecology, economics, and psychology, serving as an integrated decision variable by which options are compared. Subjective value is often assumed to be an absolute quantity, determined in a static manner by the properties of an individual option. Recent neurobiological studies, however, have shown that neural value coding dynamically adapts to the statistics of the recent reward environment, introducing an intrinsic temporal context dependence into the neural representation of value. Whether valuation exhibits this kind of dynamic adaptation at the behavioral level is unknown. Here, we show that the valuation process in human subjects adapts to the history of previous values, with current valuations varying inversely with the average value of recently observed items. The dynamics of this adaptive valuation are captured by divisive normalization, linking these temporal context effects to spatial context effects in decision making as well as spatial and temporal context effects in perception. These findings suggest that adaptation is a universal feature of neural information processing and offer a unifying explanation for contextual phenomena in fields ranging from visual psychophysics to economic choice.


Brain ◽  
2017 ◽  
Vol 140 (9) ◽  
pp. 2460-2474 ◽  
Author(s):  
Junchao Tong ◽  
Gausiha Rathitharan ◽  
Jeffrey H Meyer ◽  
Yoshiaki Furukawa ◽  
Lee-Cyn Ang ◽  
...  

Abstract See Jellinger (doi:10.1093/awx190) for a scientific commentary on this article.  The enzyme monoamine oxidases (B and A subtypes, encoded by MAOB and MAOA, respectively) are drug targets in the treatment of Parkinson’s disease. Inhibitors of MAOB are used clinically in Parkinson’s disease for symptomatic purposes whereas the potential disease-modifying effect of monoamine oxidase inhibitors is debated. As astroglial cells express high levels of MAOB, the enzyme has been proposed as a brain imaging marker of astrogliosis, a cellular process possibly involved in Parkinson’s disease pathogenesis as elevation of MAOB in astrocytes might be harmful. Since brain monoamine oxidase status in Parkinson’s disease is uncertain, our objective was to measure, by quantitative immunoblotting in autopsied brain homogenates, protein levels of both monoamine oxidases in three different degenerative parkinsonian disorders: Parkinson’s disease (n = 11), multiple system atrophy (n = 11), and progressive supranuclear palsy (n = 16) and in matched controls (n = 16). We hypothesized that if MAOB is ‘substantially’ localized to astroglial cells, MAOB levels should be generally associated with standard astroglial protein measures (e.g. glial fibrillary acidic protein). MAOB levels were increased in degenerating putamen (+83%) and substantia nigra (+10%, non-significant) in multiple system atrophy; in caudate (+26%), putamen (+27%), frontal cortex (+31%) and substantia nigra (+23%) of progressive supranuclear palsy; and in frontal cortex (+33%), but not in substantia nigra of Parkinson’s disease, a region we previously reported no increase in astrocyte protein markers. Although the magnitude of MAOB increase was less than those of standard astrocytic markers, significant positive correlations were observed amongst the astrocyte proteins and MAOB. Despite suggestions that MAOA (versus MAOB) is primarily responsible for metabolism of dopamine in dopamine neurons, there was no loss of the enzyme in the parkinsonian substantia nigra; instead, increased nigral levels of a MAOA fragment and ‘turnover’ of the enzyme were observed in the conditions. Our findings provide support that MAOB might serve as a biochemical imaging marker, albeit not entirely specific, for astrocyte activation in human brain. The observation that MAOB protein concentration is generally increased in degenerating brain areas in multiple system atrophy (especially putamen) and in progressive supranuclear palsy, but not in the nigra in Parkinson’s disease, also distinguishes astrocyte behaviour in Parkinson’s disease from that in the two ‘Parkinson-plus’ conditions. The question remains whether suppression of either MAOB in astrocytes or MAOA in dopamine neurons might influence progression of the parkinsonian disorders.


Sign in / Sign up

Export Citation Format

Share Document