scholarly journals Deliberation gated by opportunity cost adapts to context with urgency

2021 ◽  
Author(s):  
Maximilian Puelma Touzel ◽  
Paul Cisek ◽  
Guillaume Lajoie

The value we place on our time impacts what we decide to do with it. Value it too little, and we obsess over all details. Value it too much, and we rush carelessly to move on. How to strike this often context-specific balance is a challenging decision-making problem. Average-reward, putatively encoded by tonic dopamine, serves in existing reinforcement learning theory as the stationary opportunity cost of time. However, environmental context and the cost of deliberation therein often varies in time and is hard to infer and predict. Here, we define a non-stationary opportunity cost of deliberation arising from performance variation on multiple timescales. Estimated from reward history, this cost readily adapts to reward-relevant changes in context and suggests a generalization of average-reward reinforcement learning (AR-RL) to account for non-stationary contextual factors. We use this deliberation cost in a simple decision-making heuristic called Performance-Gated Deliberation, which approximates AR-RL and is consistent with empirical results in both cognitive and systems decision-making neuroscience. We propose that deliberation cost is implemented directly as urgency, a previously characterized neural signal effectively controlling the speed of the decision-making process. We use behaviour and neural recordings from non-human primates in a non-stationary random walk prediction task to support our results. We make readily testable predictions for both neural activity and behaviour and discuss how this proposal can facilitate future work in cognitive and systems neuroscience of reward-driven behaviour.

2021 ◽  
Vol 2 ◽  
Author(s):  
Zekun Cao ◽  
Jeronimo Grandi ◽  
Regis Kopper

Dynamic field of view (FOV) restrictors have been successfully used to reduce visually induced motion sickness (VIMS) during continuous viewpoint motion control (virtual travel) in virtual reality (VR). This benefit, however, comes at the cost of losing peripheral awareness during provocative motion. Likewise, the use of visual references that are stable in relation to the physical environment, called rest frames (RFs), has also been shown to reduce discomfort during virtual travel tasks in VR. We propose a new RF-based design called Granulated Rest Frames (GRFs) with a soft-edged circular cutout in the center that leverages the rest frames’ benefits without completely blocking the user’s peripheral view. The GRF design is application-agnostic and does not rely on context-specific RFs, such as commonly used cockpits. We report on a within-subjects experiment with 20 participants. The results suggest that, by strategically applying GRFs during a visual search session in VR, we can achieve better item searching efficiency as compared to restricted FOV. The effect of GRFs on reducing VIMS remains to be determined by future work.


Author(s):  
Hossein Esfandiari ◽  
MohammadTaghi HajiAghayi ◽  
Brendan Lucier ◽  
Michael Mitzenmacher

We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.


2020 ◽  
Author(s):  
Akshay Nair ◽  
Ritwik K. Niyogi ◽  
Fei Shang ◽  
Sarah J. Tabrizi ◽  
Geraint Rees ◽  
...  

Background: Apathy, a disabling and poorly understood neuropsychiatric symptom, is characterised by impaired self-initiated behaviour. Although the computational mechanisms that determine self-initiation are poorly understood, it has been hypothesised that the opportunity cost of time (OCT) may be a key variable linking self-initiated behaviour with motivational status. Using a novel behavioural task and computational modelling, we investigated the relationship between OCT, self-initiation and apathy. OCT represents the amount of reward which is foregone per second if no action is taken. We predicted that higher OCT would engender shorter action latencies, and that individuals with greater sensitivity to OCT would have higher behavioural apathy.Methods: We modulated the OCT in a novel task called the ‘Fisherman Game’, Participants freely chose when to self-initiate actions to either collect rewards, or on occasion, to complete non-rewarding actions. We measured the relationship between action latencies, OCT and apathy for each participant across two independent non-clinical studies, one under laboratory conditions (n=21) and one online (n=90). ‘Average-reward’ reinforcement learning was used to model our data. We replicated our findings across both studies.Results: We show that the latency of self-initiation is driven by changes in the OCT. Furthermore, we demonstrate, for the first time, higher apathy was showed greater sensitivity to changes in OCT in younger adults. Our model shows that apathetic individuals experienced greatest change in subjective OCT during our task as a consequence of being more sensitive to rewards.Conclusions: Our results suggest that OCT is an important variable for determining free-operant action initiation and understanding apathy.


2018 ◽  
Author(s):  
Nura Sidarus ◽  
Stefano Palminteri ◽  
Valérian Chambon

AbstractValue-based decision-making involves trading off the cost associated with an action against its expected reward. Research has shown that both physical and mental effort constitute such subjective costs, biasing choices away from effortful actions, and discounting the value of obtained rewards. Facing conflicts between competing action alternatives is considered aversive, as recruiting cognitive control to overcome conflict is effortful. Yet, it remains unclear whether conflict is also perceived as a cost in value-based decisions. The present study investigated this question by embedding irrelevant distractors (flanker arrows) within a reversal-learning task, with intermixed free and instructed trials. Results showed that participants learned to adapt their choices to maximize rewards, but were nevertheless biased to follow the suggestions of irrelevant distractors. Thus, the perceived cost of being in conflict with an external suggestion could sometimes trump internal value representations. By adapting computational models of reinforcement learning, we assessed the influence of conflict at both the decision and learning stages. Modelling the decision showed that conflict was avoided when evidence for either action alternative was weak, demonstrating that the cost of conflict was traded off against expected rewards. During the learning phase, we found that learning rates were reduced in instructed, relative to free, choices. Learning rates were further reduced by conflict between an instruction and subjective action values, whereas learning was not robustly influenced by conflict between one’s actions and external distractors. Our results show that the subjective cost of conflict factors into value-based decision-making, and highlights that different types of conflict may have different effects on learning about action outcomes.


2014 ◽  
Vol 34 (4) ◽  
pp. 1212-1223 ◽  
Author(s):  
J. E. S. Choi ◽  
P. A. Vaswani ◽  
R. Shadmehr

2015 ◽  
Vol 105 (5) ◽  
pp. 267-272 ◽  
Author(s):  
David Laibson

Present-biased preferences engender a demand for commitment. Commitment is a problematic prediction, since we see so little of it. I quantitatively explore the reasons for the “missing” commitment. Extending the procrastination model in Carroll et al. (2009), I show how equilibrium commitment is related to (i) the standard deviation of the opportunity cost of time, (ii) the cost of delay, (iii) the degree of partial naivete, and (iv) the direct cost of commitment. The calibrated model demonstrates that the perceived benefits of commitment are often overwhelmed by the costs of commitment. Demand for commitment is a special case rather than the general case.


2021 ◽  
Author(s):  
Sean Devine ◽  
Cassandra Neumann ◽  
A. Ross Otto ◽  
Florian Bolenz ◽  
Andrea M.F. Reiter ◽  
...  

Previous work suggests that lifespan developmental differences in cognitive control reflect maturational and aging-related changes in prefrontal cortex functioning. However, complementary explanations exist: It could be that children and older adults differ from younger adults in how they balance the effort of engaging in control against its potential benefits. Here we test whether the degree of cognitive effort expenditure depends on the opportunity cost of time (average reward rate per unit time): if the average reward rate is high, participants should withhold cognitive effort whereas if it is low, they should invest more. In Experiment 1, we examine this hypothesis in children, adolescents, younger, and older adults, by applying a reward rate manipulation in two cognitive control tasks: a modified Erikson Flanker and a task-switching paradigm. We found that young adults and adolescents reflexively withheld effort when the opportunity cost of time was high, whereas older adults and, to a lesser degree children, invested more resources to accumulate reward as quickly as possible. We tentatively interpret these results in terms of age- and task-specific differences in the processing of the opportunity cost of time. We qualify our findings in a second experiment in younger adults in which we address an alternative explanation of our results and show that the observed age differences in effort expenditure may not result from differences in task difficulty. To conclude, we think that our results present an interesting first step at relating opportunity costs to motivational processes across the lifespan. We frame the implications of further work in this area within a recent developmental model of resource-rationality, which points to developmental sweet spots in cognitive control.


Author(s):  
Clement Leung ◽  
Nikki Lijing Kuang ◽  
Vienne W. K. Sung

Organizations need to constantly learn, develop, and evaluate new strategies and policies for their effective operation. Unsupervised reinforcement learning is becoming a highly useful tool, since rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to external chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning efficient stopping rules. In this Chapter, we explain the deployment of different learning strategies in given environments for reinforcement learning policy evaluation and review, and we present suggestions for their practical use and applications.


2017 ◽  
Author(s):  
A. Ross Otto ◽  
Nathaniel D. Daw

AbstractA spate of recent work demonstrates that humans seek to avoid the expenditure of cognitive effort, much like physical effort or economic resources. Less is clear, however, about the circumstances dictating how and when people decide to expend cognitive effort. Here we adopt a popular theory of opportunity costs and response vigor and to elucidate this question. This account, grounded in Reinforcement Learning, formalizes a trade-off between two costs: the harder work assumed necessary to emit faster actions and the opportunity cost inherent in acting more slowly (i.e., the delay that results to the next reward and subsequent rewards). Recent work reveals that the opportunity cost of time—operationalized as the average reward rate per unit time, theorized to be signaled by tonic dopamine levels, modulates the speed with which a person responds in a simple discrimination tasks. We extend this framework to cognitive effort in a diverse range of cognitive tasks, for which 1) the amount of cognitive effort demanded from the task varies from trial to trial and 2) the putative expenditure of cognitive effort holds measureable consequences in terms of accuracy and response time. In the domains of cognitive control, perceptual decision-making, and task-switching, we found that subjects tuned their level of effort exertion in accordance with the experienced average reward rate: when the opportunity cost of time was high, subjects made more errors and responded more quickly, which we interpret as a withdrawal of cognitive effort. That is, expenditure of cognitive effort appeared to be modulated by the opportunity cost of time. Further, and consistent with our account, the strength of this modulation was predicted by individual differences in efficacy of cognitive control. Taken together, our results elucidate the circumstances dictating how and when people expend cognitive effort.


2021 ◽  
pp. 1-10
Author(s):  
Akshay Nair ◽  
Ritwik K. Niyogi ◽  
Fei Shang ◽  
Sarah J. Tabrizi ◽  
Geraint Rees ◽  
...  

Abstract Background Apathy, a disabling and poorly understood neuropsychiatric symptom, is characterised by impaired self-initiated behaviour. It has been hypothesised that the opportunity cost of time (OCT) may be a key computational variable linking self-initiated behaviour with motivational status. OCT represents the amount of reward which is foregone per second if no action is taken. Using a novel behavioural task and computational modelling, we investigated the relationship between OCT, self-initiation and apathy. We predicted that higher OCT would engender shorter action latencies, and that individuals with greater sensitivity to OCT would have higher behavioural apathy. Methods We modulated the OCT in a novel task called the ‘Fisherman Game’, Participants freely chose when to self-initiate actions to either collect rewards, or on occasion, to complete non-rewarding actions. We measured the relationship between action latencies, OCT and apathy for each participant across two independent non-clinical studies, one under laboratory conditions (n = 21) and one online (n = 90). ‘Average-reward’ reinforcement learning was used to model our data. We replicated our findings across both studies. Results We show that the latency of self-initiation is driven by changes in the OCT. Furthermore, we demonstrate, for the first time, that participants with higher apathy showed greater sensitivity to changes in OCT in younger adults. Our model shows that apathetic individuals experienced greatest change in subjective OCT during our task as a consequence of being more sensitive to rewards. Conclusions Our results suggest that OCT is an important variable for determining free-operant action initiation and understanding apathy.


Sign in / Sign up

Export Citation Format

Share Document