Rational Inattention and Tonic Dopamine

AbstractSlow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA—the average reward theory and the Bayesian theory in which DA controls precision—have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of ‘rational inattention,’ which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock—thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.Author SummaryThe roles of tonic dopamine (DA) have been the subject of much speculation, partly due to the variety of processes it has been implicated in. For instance, tonic DA modulates how we learn new information, but also affects how previously learned information is used. DA affects the speed of our internal timing mechanism, but also modulates the degree to which our temporal estimates are influenced by context. DA improves performance in some tasks, but seems only to affect confidence in others. Are there common principles that govern the role of DA across these domains? In this work, we introduce the concept of ‘rational inattention,’ originally borrowed from economics, to the DA literature. We show how the rational inattention account of DA unites two influential theories that are seemingly at odds: the average reward theory and the Bayesian theory of tonic DA. We then show how this framework reconciles the diverse roles of DA, which cannot be addressed by either theory alone.

Download Full-text

Rational inattention and tonic dopamine

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008659 ◽

2021 ◽

Vol 17 (3) ◽

pp. e1008659

Author(s):

John G. Mikhael ◽

Lucy Lai ◽

Samuel J. Gershman

Keyword(s):

Reinforcement Learning ◽

Intertemporal Choice ◽

Interval Timing ◽

Internal Clock ◽

Average Reward ◽

Rational Inattention ◽

Temporal Reproduction ◽

Temporal Stimuli ◽

Experimental Findings ◽

Choice Tasks

Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA—the average reward theory and the Bayesian theory in which DA controls precision—have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of ‘rational inattention,’ which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock—thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.

Download Full-text

LC-Learning: Phased Method for Average Reward Reinforcement Learning —Preliminary Results —

Lecture Notes in Computer Science - PRICAI 2002: Trends in Artificial Intelligence ◽

10.1007/3-540-45683-x_24 ◽

2002 ◽

pp. 208-217 ◽

Cited By ~ 7

Author(s):

Taro Konda ◽

Shinjiro Tensyo ◽

Tomohiro Yamaguchi

Keyword(s):

Reinforcement Learning ◽

Average Reward ◽

Preliminary Results

Download Full-text

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results

Recent Advances in Reinforcement Learning ◽

10.1007/978-0-585-33656-5_8 ◽

2007 ◽

pp. 159-195 ◽

Cited By ~ 3

Author(s):

Sridhar Mahadevan

Keyword(s):

Reinforcement Learning ◽

Average Reward ◽

Empirical Results

Download Full-text

Improving selection strategies in zeroth-level classifier systems based on average reward reinforcement learning

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-018-0682-x ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhaoxiang Zang ◽

Zhao Li ◽

Zhiping Dan ◽

Junying Wang

Keyword(s):

Reinforcement Learning ◽

Average Reward ◽

Classifier Systems ◽

Selection Strategies

Download Full-text

The interpretation of German personal pronouns and d-pronouns

Zeitschrift für Sprachwissenschaft ◽

10.1515/zfs-2019-2002 ◽

2019 ◽

Vol 38 (2) ◽

pp. 155-190

Author(s):

Markus Bader ◽

Yvonne Portele

Keyword(s):

Bayesian Theory ◽

The Other ◽

Production Data ◽

Personal Pronoun ◽

Personal Pronouns ◽

Two Factors ◽

The Subject ◽

Syntactic Function

Abstract Three experiments investigated the interpretation and production of pronouns in German. The first two experiments probed the preferred interpretation of a pronoun in contexts containing two potential antecedents by having participants complete a sentence fragment starting either with a personal pronoun or a d-pronoun. We systematically varied three properties of the potential antecedents: syntactic function, linear position, and topicality. The results confirm a subject preference for personal pronouns. The preferred interpretation of d-pronouns cannot be captured by any of the three factors alone. Although a d-pronoun preferentially refers to the non-topic in many cases, this preference can be overridden by the other two factors, linear position and syntactic function. In order to test whether interpretive preferences follow from production biases as proposed by the Bayesian theory of Kehler et al. (2008), a third experiment had participants freely produce a continuation sentence for the contexts of the first two experiments. The results show that personal pronouns are used more often to refer to a subject than to an object, recapitulating the subject preference found for interpretation and thereby confirming the account of Kehler et al. (2008). The interpretation results for the d-pronoun likewise follow from the corresponding production data.

Download Full-text

Average Reward Reinforcement Learning for Semi-Markov Decision Processes

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-70087-8_79 ◽

2017 ◽

pp. 768-777

Author(s):

Jiayuan Yang ◽

Yanjie Li ◽

Haoyao Chen ◽

Jiangang Li

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

RVI reinforcement learning for semi-Markov decision processes with average reward

2010 8th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2010.5554785 ◽

2010 ◽

Author(s):

Yanjie Li ◽

Fang Cao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Average Reward ◽

Markov Decision

Download Full-text

Designing Internal Reward of Reinforcement Learning Agents in Multi-Step Dilemma Problem

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0926 ◽

2013 ◽

Vol 17 (6) ◽

pp. 926-931 ◽

Cited By ~ 5

Author(s):

Yoshihiro Ichikawa ◽

◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Local Convergence ◽

Average Reward ◽

Q Value ◽

Learning Agents ◽

Learning Agent ◽

Simulation Results ◽

External Reward

This paper proposes the reinforcement learning agent that estimates internal rewards using external rewards in order to avoid conflict in multi-step dilemma problem. Intensive simulation results have revealed that the agent succeeds in avoiding local convergence and obtains a behavior policy for reaching a higher reward by updating the Q-value using the value that is subtracted the average reward from an external reward.

Download Full-text