average reward
Recently Published Documents


TOTAL DOCUMENTS

192
(FIVE YEARS 28)

H-INDEX

18
(FIVE YEARS 1)

Author(s):  
Lieke Hofmans ◽  
Andrew Westbrook ◽  
Ruben van den Bosch ◽  
Jan Booij ◽  
Robbert-Jan Verkes ◽  
...  

2021 ◽  
pp. 1-10
Author(s):  
Akshay Nair ◽  
Ritwik K. Niyogi ◽  
Fei Shang ◽  
Sarah J. Tabrizi ◽  
Geraint Rees ◽  
...  

Abstract Background Apathy, a disabling and poorly understood neuropsychiatric symptom, is characterised by impaired self-initiated behaviour. It has been hypothesised that the opportunity cost of time (OCT) may be a key computational variable linking self-initiated behaviour with motivational status. OCT represents the amount of reward which is foregone per second if no action is taken. Using a novel behavioural task and computational modelling, we investigated the relationship between OCT, self-initiation and apathy. We predicted that higher OCT would engender shorter action latencies, and that individuals with greater sensitivity to OCT would have higher behavioural apathy. Methods We modulated the OCT in a novel task called the ‘Fisherman Game’, Participants freely chose when to self-initiate actions to either collect rewards, or on occasion, to complete non-rewarding actions. We measured the relationship between action latencies, OCT and apathy for each participant across two independent non-clinical studies, one under laboratory conditions (n = 21) and one online (n = 90). ‘Average-reward’ reinforcement learning was used to model our data. We replicated our findings across both studies. Results We show that the latency of self-initiation is driven by changes in the OCT. Furthermore, we demonstrate, for the first time, that participants with higher apathy showed greater sensitivity to changes in OCT in younger adults. Our model shows that apathetic individuals experienced greatest change in subjective OCT during our task as a consequence of being more sensitive to rewards. Conclusions Our results suggest that OCT is an important variable for determining free-operant action initiation and understanding apathy.


Author(s):  
Hilary J. Don ◽  
Tyler Davis ◽  
Kimberly L. Ray ◽  
Megan C McMahon ◽  
Astin C. Cornwall ◽  
...  

2021 ◽  
Author(s):  
Maximilian Puelma Touzel ◽  
Paul Cisek ◽  
Guillaume Lajoie

The value we place on our time impacts what we decide to do with it. Value it too little, and we obsess over all details. Value it too much, and we rush carelessly to move on. How to strike this often context-specific balance is a challenging decision-making problem. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the stationary opportunity cost of time. However, environmental context and the cost of deliberation therein often varies in time and is hard to infer and predict. Here, we define a non-stationary opportunity cost of deliberation arising from performance variation on multiple timescales. Estimated from reward history, this cost readily adapts to reward-relevant changes in context and suggests a generalization of average-reward reinforcement learning (AR-RL) to account for non-stationary contextual factors. We use this deliberation cost in a simple decision-making heuristic called Performance-Gated Deliberation, which approximates AR-RL and is consistent with empirical results in both cognitive and systems decision-making neuroscience. We propose that deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We use behaviour and neural recordings from non-human primates in a non-stationary random walk prediction task to support our results. We make readily testable predictions for both neural activity and behaviour and discuss how this proposal can facilitate future work in cognitive and systems neuroscience of reward-driven behaviour.


2021 ◽  
Author(s):  
Sean Devine ◽  
Cassandra Neumann ◽  
A. Ross Otto ◽  
Florian Bolenz ◽  
Andrea M.F. Reiter ◽  
...  

Previous work suggests that lifespan developmental differences in cognitive control reflect maturational and aging-related changes in prefrontal cortex functioning. However, complementary explanations exist: It could be that children and older adults differ from younger adults in how they balance the effort of engaging in control against its potential benefits. Here we test whether the degree of cognitive effort expenditure depends on the opportunity cost of time (average reward rate per unit time): if the average reward rate is high, participants should withhold cognitive effort whereas if it is low, they should invest more. In Experiment 1, we examine this hypothesis in children, adolescents, younger, and older adults, by applying a reward rate manipulation in two cognitive control tasks: a modified Erikson Flanker and a task-switching paradigm. We found that young adults and adolescents reflexively withheld effort when the opportunity cost of time was high, whereas older adults and, to a lesser degree children, invested more resources to accumulate reward as quickly as possible. We tentatively interpret these results in terms of age- and task-specific differences in the processing of the opportunity cost of time. We qualify our findings in a second experiment in younger adults in which we address an alternative explanation of our results and show that the observed age differences in effort expenditure may not result from differences in task difficulty. To conclude, we think that our results present an interesting first step at relating opportunity costs to motivational processes across the lifespan. We frame the implications of further work in this area within a recent developmental model of resource-rationality, which points to developmental sweet spots in cognitive control.


2021 ◽  
Author(s):  
Milena Pothast ◽  
Stephan Koenig ◽  
Harald Lachnit ◽  
Wolfgang Einhäuser

Binocular rivalry occurs when the eyes are presented with two dissimilar images and visual awareness fluctuates between them. Previous findings suggest that perceptual dominance of a rewarded stimulus may increase relative to an unrewarded stimulus, implying a direct effect of reward on visual representations. Here, we asked how uncertainty about reward occurrence and average reward expectancy affect dominance in binocular rivalry. In three experiments, participants learnt to associate drifting gratings of distinct colors with different levels of uncertainty and expectancy. Uncertainty was manipulated by rewarding each correct trial either with 100% probability (no uncertainty) or with 50% probability (high uncertainty). The amount of reward was either identical per rewarded trial, yielding a lower expectancy in uncertain trials (Experiments 1 and 2), or reward expectancy was matched across uncertainty levels by doubling the award per rewarded trial for uncertain trials (Experiment 3). In Experiment 2, an additional low-reward condition with no uncertainty was included. Using a no-report paradigm, we measured the perceptual dominance of these gratings relative to a grating that was unassociated with reward, before and after associations had been acquired. When the rewarded stimulus feature (color) was task relevant, dominance durations increased for all rewarded gratings after acquisition. In an early phase after rivalry onset we found increased perceptual dominance for cues associated with uncertain reward compared to cues associated with certain reward. This confirms an effect of reward on perceptual dominance, and suggests that reward uncertainty associated with a stimulus has a direct bearing on its visual representation.


2021 ◽  
Author(s):  
David Jaures FOTSA MBOGNE ◽  
Armand Fonkou ◽  
Wolfgang Nzie ◽  
Adolfo Crespo Marquez

Abstract This work is concerned with the problem of optimizing maintenance policies in terms of economical rewards and availability. We consider a system with multiple states in terms of healthy mode (good state, degraded state and failure state) and maintenance action (running state, stopped for maintenance). The level of maintenance (perfect or not) is also taken into account. We propose semi-Markovian model highlighting the effects of dwell times and transitions on economical rewards. We determine an optimal policy conditionally upon the current state according to eight decision parameters related to time intervals between two preventive maintenances and the level of maintenance. We show through a sensitive analysis that decision parameters have nonlocal effects that imply a multiple objective function. Hence, we propose a compromise by optimizing the asymptotic average reward.


2021 ◽  
Author(s):  
Gary A Kane ◽  
Morgan H James ◽  
Amitai Shenhav ◽  
Nathaniel D Daw ◽  
Jonathan D Cohen ◽  
...  

In patch foraging tasks, animals must decide whether to remain with a depleting resource or to leave it in search of a potentially better source of reward. In such tasks, animals consistently follow the general predictions of optimal foraging theory (the Marginal Value Theorem; MVT): to leave a patch when the reward rate in the current patch depletes to the average reward rate across patches. Prior studies implicate an important role for the anterior cingulate cortex (ACC) in foraging decisions based on MVT: within single trials, ACC activity increases immediately preceding foraging decisions, and across trials, these dynamics are modulated as the value of staying in the patch depletes to the average reward rate. Here, we test whether these activity patterns reflect dynamic encoding of decision-variables and whether these signals are directly involved in decision-making or serve a more general function such as monitoring task performance or allocating cognitive control. We developed a leaky accumulator model based on the MVT that generates estimates of decision variables within and across trials, and tested model predictions against ACC activity recorded from rats performing a patch foraging task. Model predicted changes in MVT decision variables closely matched rat ACC activity. Next, we pharmacologically inactivated ACC to test the contribution of these signals to decision-making. Despite ACC inactivation, rats still followed the MVT decision rule, suggesting that foraging decision variables represented in the ACC are used for a more general function such as regulating cognitive control or motivation.


Sign in / Sign up

Export Citation Format

Share Document