When will's wont wants wanting

2021 ◽  
Vol 44 ◽  
Author(s):  
Peter Dayan

Abstract We use neural reinforcement learning concepts including Pavlovian versus instrumental control, liking versus wanting, model-based versus model-free control, online versus offline learning and planning, and internal versus external actions and control to reflect on putative conflicts between short-term temptations and long-term goals.

Author(s):  
Xiaomei Wang ◽  
Kit-Hang Lee ◽  
Denny K. C. Fu ◽  
Ziyang Dong ◽  
Kui Wang ◽  
...  

Author(s):  
Liting Sun ◽  
Cheng Peng ◽  
Wei Zhan ◽  
Masayoshi Tomizuka

Safety and efficiency are two key elements for planning and control in autonomous driving. Theoretically, model-based optimization methods, such as Model Predictive Control (MPC), can provide such optimal driving policies. Their computational complexity, however, grows exponentially with horizon length and number of surrounding vehicles. This makes them impractical for real-time implementation, particularly when nonlinear models are considered. To enable a fast and approximately optimal driving policy, we propose a safe imitation framework, which contains two hierarchical layers. The first layer, defined as the policy layer, is represented by a neural network that imitates a long-term expert driving policy via imitation learning. The second layer, called the execution layer, is a short-term model-based optimal controller that tracks and further fine-tunes the reference trajectories proposed by the policy layer with guaranteed short-term collision avoidance. Moreover, to reduce the distribution mismatch between the training set and the real world, Dataset Aggregation is utilized so that the performance of the policy layer can be improved from iteration to iteration. Several highway driving scenarios are demonstrated in simulations, and the results show that the proposed framework can achieve similar performance as sophisticated long-term optimization approaches but with significantly improved computational efficiency.


2014 ◽  
Vol 369 (1655) ◽  
pp. 20130478 ◽  
Author(s):  
Nathaniel D. Daw ◽  
Peter Dayan

Despite many debates in the first half of the twentieth century, it is now largely a truism that humans and other animals build models of their environments and use them for prediction and control. However, model-based (MB) reasoning presents severe computational challenges. Alternative, computationally simpler, model-free (MF) schemes have been suggested in the reinforcement learning literature, and have afforded influential accounts of behavioural and neural data. Here, we study the realization of MB calculations, and the ways that this might be woven together with MF values and evaluation methods. There are as yet mostly only hints in the literature as to the resulting tapestry, so we offer more preview than review.


2021 ◽  
Author(s):  
Maaike M.H. van Swieten ◽  
Rafal Bogacz ◽  
Sanjay G. Manohar

AbstractHuman decisions can be reflexive or planned, being governed respectively by model-free and model-based learning systems. These two systems might differ in their responsiveness to our needs. Hunger drives us to specifically seek food rewards, but here we ask whether it might have more general effects on these two decision systems. On one hand, the model-based system is often considered flexible and context-sensitive, and might therefore be modulated by metabolic needs. On the other hand, the model-free system’s primitive reinforcement mechanisms may have closer ties to biological drives. Here, we tested participants on a well-established two-stage sequential decision-making task that dissociates the contribution of model-based and model-free control. Hunger enhanced overall performance by increasing model-free control, without affecting model-based control. These results demonstrate a generalised effect of hunger on decision-making that enhances reliance on primitive reinforcement learning, which in some situations translates into adaptive benefits.Significance statementThe prevalence of obesity and eating disorder is steadily increasing. To counteract problems related to eating, people need to make rational decisions. However, appetite may switch us to a different decision mode, making it harder to achieve long-term goals. Here we show that planned and reinforcement-driven actions are differentially sensitive to hunger. Hunger specifically affected reinforcement-driven actions, and did not affect the planning of actions. Our data shows that people behave differently when they are hungry. We also provide a computational model of how the behavioural changes might arise.


Author(s):  
A. Ross Otto ◽  
Candace M. Raio ◽  
Elizabeth A. Phelps ◽  
Nathaniel Daw

2017 ◽  
Vol 28 (9) ◽  
pp. 1321-1333 ◽  
Author(s):  
Wouter Kool ◽  
Samuel J. Gershman ◽  
Fiery A. Cushman

Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system’s task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.


2021 ◽  
Vol 17 (1) ◽  
pp. e1008552
Author(s):  
Rani Moran ◽  
Mehdi Keramati ◽  
Raymond J. Dolan

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.


Author(s):  
Andreas Heinz

While dopaminergic neurotransmission has largely been implicated in reinforcement learning and model-based versus model-free decision making, serotonergic neurotransmission has been implicated in encoding aversive outcomes. Accordingly, serotonin dysfunction has been observed in disorders characterized by negative affect including depression, anxiety and addiction. Serotonin dysfunction in these mental disorders is described and its association with negative affect is discussed.


Sign in / Sign up

Export Citation Format

Share Document