Value-free reinforcement learning: Policy optimization as a minimal model of operant behavior

Reinforcement learning is a powerful framework for modelling the cognitive and neural substrates of learning and decision making. Contemporary research in cognitive neuroscience and neuroeconomics typically uses value-based reinforcement-learning models, which assume that decision-makers choose by comparing learned values for different actions. However, another possibility is suggested by a simpler family of models, called policy-gradient reinforcement learning. Policy-gradient models learn by optimizing a behavioral policy directly, without the intermediate step of value-learning. Here we review recent behavioral and neural findings that are more parsimoniously explained by policy-gradient models than by value-based models. We conclude that, despite the ubiquity of `value' in reinforcement-learning models of decision making, policy-gradient models provide a lightweight and compelling alternative model of operant behavior.

Download Full-text

The role of reinforcement learning models to assess decision-making in the Iowa Gambling Task under the influence of alcohol

Frontiers in Computational Neuroscience ◽

10.3389/conf.fncom.2012.55.00057 ◽

2012 ◽

Vol 6 ◽

Author(s):

Smolka Michael

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Iowa Gambling Task ◽

Gambling Task ◽

Learning Models ◽

Reinforcement Learning Models

Download Full-text

Understanding Human Decision Making in an Interactive Landslide Simulator Tool via Reinforcement Learning

Frontiers in Psychology ◽

10.3389/fpsyg.2020.499422 ◽

2021 ◽

Vol 11 ◽

Author(s):

Pratik Chaturvedi ◽

Varun Dutt

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Computational Models ◽

Random Model ◽

Learning Models ◽

Human Decision ◽

Ev Model ◽

Reinforcement Learning Models

Prior research has used an Interactive Landslide Simulator (ILS) tool to investigate human decision making against landslide risks. It has been found that repeated feedback in the ILS tool about damages due to landslides causes an improvement in human decisions against landslide risks. However, little is known on how theories of learning from feedback (e.g., reinforcement learning) would account for human decisions in the ILS tool. The primary goal of this paper is to account for human decisions in the ILS tool via computational models based upon reinforcement learning and to explore the model mechanisms involved when people make decisions in the ILS tool. Four different reinforcement-learning models were developed and evaluated in their ability to capture human decisions in an experiment involving two conditions in the ILS tool. The parameters of an Expectancy-Valence (EV) model, two Prospect-Valence-Learning models (PVL and PVL-2), a combination EV-PU model, and a random model were calibrated to human decisions in the ILS tool across the two conditions. Later, different models with their calibrated parameters were generalized to data collected in an experiment involving a new condition in ILS. When generalized to this new condition, the PVL-2 model’s parameters of both damage-feedback conditions outperformed all other RL models (including the random model). We highlight the implications of our results for decision making against landslide risks.

Download Full-text

Supplemental Material for Reconciling Reinforcement Learning Models With Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

Psychological Review ◽

10.1037/0033-295x.114.3.784.supp ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Reinforcement Learning ◽

Problem Gambling ◽

Learning Models ◽

Behavioral Extinction ◽

Reinforcement Learning Models

Download Full-text

Bayes factors for reinforcement-learning models of the Iowa gambling task.

Decision ◽

10.1037/dec0000040 ◽

2016 ◽

Vol 3 (2) ◽

pp. 115-131 ◽

Cited By ~ 14

Author(s):

Helen Steingroever ◽

Ruud Wetzels ◽

Eric-Jan Wagenmakers

Keyword(s):

Reinforcement Learning ◽

Iowa Gambling Task ◽

Bayes Factors ◽

Gambling Task ◽

Learning Models ◽

Reinforcement Learning Models

Download Full-text

Effects of Working Memory Capacity on the Speed and Accuracy of Learning in Reinforcement Learning Models

PsycEXTRA Dataset ◽

10.1037/e528942014-552 ◽

2014 ◽

Author(s):

Adnane Ez-Zizi ◽

Simon Farrell ◽

David Leslie

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Working Memory Capacity ◽

Memory Capacity ◽

Learning Models ◽

Reinforcement Learning Models ◽

Speed And Accuracy

Download Full-text

Supplemental Material for Reinforcement Learning Models of Risky Choice and the Promotion of Risk-Taking by Losses Disguised as Wins in Rats

Journal of Experimental Psychology Animal Learning and Cognition ◽

10.1037/xan0000141.supp ◽

2017 ◽

Keyword(s):

Reinforcement Learning ◽

Risk Taking ◽

Risky Choice ◽

Learning Models ◽

Losses Disguised As Wins ◽

Reinforcement Learning Models

Download Full-text

Individual differences in experienced and observational decision-making illuminate interactions between reinforcement learning and declarative memory

Scientific Reports ◽

10.1038/s41598-021-85322-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Batel Yifrah ◽

Ayelet Ramaty ◽

Genela Morris ◽

Avi Mendelsohn

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Declarative Memory ◽

Contextual Information ◽

Memory Performance ◽

Relevant Information ◽

Subjective Memory ◽

Types Of Information ◽

Reinforcement Learning Models ◽

Implicit And Explicit

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.

Download Full-text