Model based planners reflect on their model-free propensities

Dual-reinforcement learning theory proposes behaviour is under the tutelage of a retrospective, value-caching, model-free (MF) system and a prospective-planning, model-based (MB), system. This architecture raises a question as to the degree to which, when devising a plan, a MB controller takes account of influences from its MF counterpart. We present evidence that such a sophisticated self-reflective MB planner incorporates an anticipation of the influences its own MF-proclivities exerts on the execution of its planned future actions. Using a novel bandit task, wherein subjects were periodically allowed to design their environment, we show that reward-assignments were constructed in a manner consistent with a MB system taking account of its MF propensities. Thus, in the task participants assigned higher rewards to bandits that were momentarily associated with stronger MF tendencies. Our findings have implications for a range of decision making domains that includes drug abuse, pre-commitment, and the tension between short and long-term decision horizons in economics.

Download Full-text

Hunger improves reinforcement-driven but not planned action

10.1101/2021.03.24.436435 ◽

2021 ◽

Author(s):

Maaike M.H. van Swieten ◽

Rafal Bogacz ◽

Sanjay G. Manohar

Keyword(s):

Decision Making ◽

Sequential Decision ◽

Context Sensitive ◽

Decision Systems ◽

Model Based ◽

Model Free ◽

Prevalence Of Obesity ◽

Model Free Control ◽

Decision Mode

AbstractHuman decisions can be reflexive or planned, being governed respectively by model-free and model-based learning systems. These two systems might differ in their responsiveness to our needs. Hunger drives us to specifically seek food rewards, but here we ask whether it might have more general effects on these two decision systems. On one hand, the model-based system is often considered flexible and context-sensitive, and might therefore be modulated by metabolic needs. On the other hand, the model-free system’s primitive reinforcement mechanisms may have closer ties to biological drives. Here, we tested participants on a well-established two-stage sequential decision-making task that dissociates the contribution of model-based and model-free control. Hunger enhanced overall performance by increasing model-free control, without affecting model-based control. These results demonstrate a generalised effect of hunger on decision-making that enhances reliance on primitive reinforcement learning, which in some situations translates into adaptive benefits.Significance statementThe prevalence of obesity and eating disorder is steadily increasing. To counteract problems related to eating, people need to make rational decisions. However, appetite may switch us to a different decision mode, making it harder to achieve long-term goals. Here we show that planned and reinforcement-driven actions are differentially sensitive to hunger. Hunger specifically affected reinforcement-driven actions, and did not affect the planning of actions. Our data shows that people behave differently when they are hungry. We also provide a computational model of how the behavioural changes might arise.

Download Full-text

Reduced model-based decision-making in gambling disorder

Scientific Reports ◽

10.1038/s41598-019-56161-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 6

Author(s):

Florent Wyckmans ◽

A. Ross Otto ◽

Miriam Sebold ◽

Nathaniel Daw ◽

Antoine Bechara ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Gambling Disorder ◽

Reaction Times ◽

Learning System ◽

Behavioral Learning ◽

Model Based ◽

Model Free ◽

Effective Interventions ◽

Compulsive Behaviors

AbstractCompulsive behaviors (e.g., addiction) can be viewed as an aberrant decision process where inflexible reactions automatically evoked by stimuli (habit) take control over decision making to the detriment of a more flexible (goal-oriented) behavioral learning system. These behaviors are thought to arise from learning algorithms known as “model-based” and “model-free” reinforcement learning. Gambling disorder, a form of addiction without the confound of neurotoxic effects of drugs, showed impaired goal-directed control but the way in which problem gamblers (PG) orchestrate model-based and model-free strategies has not been evaluated. Forty-nine PG and 33 healthy participants (CP) completed a two-step sequential choice task for which model-based and model-free learning have distinct and identifiable trial-by-trial learning signatures. The influence of common psychopathological comorbidities on those two forms of learning were investigated. PG showed impaired model-based learning, particularly after unrewarded outcomes. In addition, PG exhibited faster reaction times than CP following unrewarded decisions. Troubled mood, higher impulsivity (i.e., positive and negative urgency) and current and chronic stress reported via questionnaires did not account for those results. These findings demonstrate specific reinforcement learning and decision-making deficits in behavioral addiction that advances our understanding and may be important dimensions for designing effective interventions.

Download Full-text

When will's wont wants wanting

Behavioral and Brain Sciences ◽

10.1017/s0140525x20001508 ◽

2021 ◽

Vol 44 ◽

Author(s):

Peter Dayan

Keyword(s):

Reinforcement Learning ◽

Short Term ◽

Model Based ◽

Model Free ◽

Versus Model ◽

Offline Learning ◽

Model Free Control ◽

And Control ◽

Instrumental Control

Abstract We use neural reinforcement learning concepts including Pavlovian versus instrumental control, liking versus wanting, model-based versus model-free control, online versus offline learning and planning, and internal versus external actions and control to reflect on putative conflicts between short-term temptations and long-term goals.

Download Full-text

Serotonergic Neurotransmission and Its Role in Negative Affect

A New Understanding of Mental Disorders ◽

10.7551/mitpress/9780262036894.003.0005 ◽

2017 ◽

Author(s):

Andreas Heinz

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Mental Disorders ◽

Negative Affect ◽

Dopaminergic Neurotransmission ◽

Model Based ◽

Model Free ◽

Versus Model ◽

Serotonergic Neurotransmission

While dopaminergic neurotransmission has largely been implicated in reinforcement learning and model-based versus model-free decision making, serotonergic neurotransmission has been implicated in encoding aversive outcomes. Accordingly, serotonin dysfunction has been observed in disorders characterized by negative affect including depression, anxiety and addiction. Serotonin dysfunction in these mental disorders is described and its association with negative affect is discussed.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text

Hybrid-Model-Based Deep Reinforcement Learning for Heating, Ventilation, and Air-Conditioning Control

Frontiers in Energy Research ◽

10.3389/fenrg.2020.610518 ◽

2021 ◽

Vol 8 ◽

Author(s):

Huan Zhao ◽

Junhua Zhao ◽

Ting Shu ◽

Zibin Pan

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Hybrid Model ◽

Learning Process ◽

Energy Cost ◽

Air Conditioning ◽

Electricity Price ◽

Learning Efficiency ◽

Model Based ◽

Model Free

Buildings account for a large proportion of the total energy consumption in many countries and almost half of the energy consumption is caused by the Heating, Ventilation, and air-conditioning (HVAC) systems. The model predictive control of HVAC is a complex task due to the dynamic property of the system and environment, such as temperature and electricity price. Deep reinforcement learning (DRL) is a model-free method that utilizes the “trial and error” mechanism to learn the optimal policy. However, the learning efficiency and learning cost are the main obstacles of the DRL method to practice. To overcome this problem, the hybrid-model-based DRL method is proposed for the HVAC control problem. Firstly, a specific MDPs is defined by considering the energy cost, temperature violation, and action violation. Then the hybrid-model-based DRL method is proposed, which utilizes both the knowledge-driven model and the data-driven model during the whole learning process. Finally, the protection mechanism and adjusting reward methods are used to further reduce the learning cost. The proposed method is tested in a simulation environment using the Australian Energy Market Operator (AEMO) electricity price data and New South Wales temperature data. Simulation results show that 1) the DRL method can reduce the energy cost while maintaining the temperature satisfactory compared to the short term MPC method; 2) the proposed method improves the learning efficiency and reduces the learning cost during the learning process compared to the model-free method.

Download Full-text

Severe Sexual Abuse Reduces Frontoparietal Network Activity during Model-Based Reinforcement Learning Updates

10.31234/osf.io/fqv2s ◽

2019 ◽

Author(s):

Allison Letkiewicz ◽

Amy L. Cochran ◽

Josh M. Cisler

Keyword(s):

Sexual Abuse ◽

Reinforcement Learning ◽

Learning Styles ◽

Computational Models ◽

Learning Strategy ◽

Network Activity ◽

Model Based ◽

Model Free ◽

Frontoparietal Network ◽

Assaultive Trauma

Trauma and trauma-related disorders are characterized by altered learning styles. Two learning processes that have been delineated using computational modeling are model-free and model-based reinforcement learning (RL), characterized by trial and error and goal-driven, rule-based learning, respectively. Prior research suggests that model-free RL is disrupted among individuals with a history of assaultive trauma and may contribute to altered fear responding. Currently, it is unclear whether model-based RL, which involves building abstract and nuanced representations of stimulus-outcome relationships to prospectively predict action-related outcomes, is also impaired among individuals who have experienced trauma. The present study sought to test the hypothesis of impaired model-based RL among adolescent females exposed to assaultive trauma. Participants (n=60) completed a three-arm bandit RL task during fMRI acquisition. Two computational models compared the degree to which each participant’s task behavior fit the use of a model-free versus model-based RL strategy. Overall, a greater portion of participants’ behavior was better captured by the model-based than model-free RL model. Although assaultive trauma did not predict learning strategy use, greater sexual abuse severity predicted less use of model-based compared to model-free RL. Additionally, severe sexual abuse predicted less left frontoparietal network encoding of model-based RL updates, which was not accounted for by PTSD. Given the significant impact that sexual trauma has on mental health and other aspects of functioning, it is plausible that altered model-based RL is an important route through which clinical impairment emerges.

Download Full-text