Temporal and state abstractions for efficient learning, transfer and composition in humans

Mapping Intimacies ◽

10.1101/2020.02.20.958587 ◽

2020 ◽

Author(s):

Liyu Xia ◽

Anne G. E. Collins

Keyword(s):

Reinforcement Learning ◽

Quantitative Model ◽

Sequential Decision ◽

Temporal Abstraction ◽

Transfer Effects ◽

Levels Of Abstraction ◽

One Step ◽

State Abstraction ◽

Efficient Learning ◽

Reinforcement Learning Models

AbstractHumans use prior knowledge to efficiently solve novel tasks, but how they structure past knowledge to enable such fast generalization is not well understood. We recently proposed that hierarchical state abstraction enabled generalization of simple one-step rules, by inferring context clusters for each rule. However, humans’ daily tasks are often temporally extended, and necessitate more complex multi-step, hierarchically structured strategies. The options framework in hierarchical reinforcement learning provides a theoretical framework for representing such transferable strategies. Options are abstract multi-step policies, assembled from simpler one-step actions or other options, that can represent meaningful reusable strategies as temporal abstractions. We developed a novel sequential decision making protocol to test if humans learn and transfer multi-step options. In a series of four experiments, we found transfer effects at multiple hierarchical levels of abstraction that could not be explained by flat reinforcement learning models or hierarchical models lacking temporal abstraction. We extended the options framework to develop a quantitative model that blends temporal and state abstractions. Our model captures the transfer effects observed in human participants. Our results provide evidence that humans create and compose hierarchical options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning.

Download Full-text

Hierarchical Reinforcement Learning

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch122 ◽

2011 ◽

pp. 825-830

Author(s):

Carlos Diuk ◽

Michael Littman

Keyword(s):

Reinforcement Learning ◽

Learning Problems ◽

Underlying Structure ◽

Sequential Decision ◽

State Spaces ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Finite Set ◽

State Abstraction ◽

Main Ideas

Reinforcement learning (RL) deals with the problem of an agent that has to learn how to behave to maximize its utility by its interactions with an environment (Sutton & Barto, 1998; Kaelbling, Littman & Moore, 1996). Reinforcement learning problems are usually formalized as Markov Decision Processes (MDP), which consist of a finite set of states and a finite number of possible actions that the agent can perform. At any given point in time, the agent is in a certain state and picks an action. It can then observe the new state this action leads to, and receives a reward signal. The goal of the agent is to maximize its long-term reward. In this standard formalization, no particular structure or relationship between states is assumed. However, learning in environments with extremely large state spaces is infeasible without some form of generalization. Exploiting the underlying structure of a problem can effect generalization and has long been recognized as an important aspect in representing sequential decision tasks (Boutilier et al., 1999). Hierarchical Reinforcement Learning is the subfield of RL that deals with the discovery and/or exploitation of this underlying structure. Two main ideas come into play in hierarchical RL. The first one is to break a task into a hierarchy of smaller subtasks, each of which can be learned faster and easier than the whole problem. Subtasks can also be performed multiple times in the course of achieving the larger task, reusing accumulated knowledge and skills. The second idea is to use state abstraction within subtasks: not every task needs to be concerned with every aspect of the state space, so some states can actually be abstracted away and treated as the same for the purpose of the given subtask.

Download Full-text

Skill-based curiosity for intrinsically motivated reinforcement learning

Machine Learning ◽

10.1007/s10994-019-05845-8 ◽

2019 ◽

Vol 109 (3) ◽

pp. 493-512 ◽

Cited By ~ 2

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Skill Learning ◽

High Dimensional ◽

Sequential Decision ◽

Learning Methods ◽

Reward Function ◽

Intrinsic Reward ◽

Reinforcement Learning Models ◽

Data Efficiency

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.

Download Full-text

Planning with Abstract Learned Models While Learning Transferable Subtasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6555 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9992-10000

Author(s):

John Winder ◽

Stephanie Milani ◽

Matthew Landen ◽

Erebus Oh ◽

Shane Parr ◽

...

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Hierarchical Models ◽

Formal Structure ◽

Levels Of Abstraction ◽

Hierarchical Reinforcement Learning ◽

Markov Decision ◽

Multiple Levels ◽

Efficient Learning

We introduce an algorithm for model-based hierarchical reinforcement learning to acquire self-contained transition and reward models suitable for probabilistic planning at multiple levels of abstraction. We call this framework Planning with Abstract Learned Models (PALM). By representing subtasks symbolically using a new formal structure, the lifted abstract Markov decision process (L-AMDP), PALM learns models that are independent and modular. Through our experiments, we show how PALM integrates planning and execution, facilitating a rapid and efficient learning of abstract, hierarchical models. We also demonstrate the increased potential for learned models to be transferred to new and related tasks.

Download Full-text

Supplemental Material for Reconciling Reinforcement Learning Models With Behavioral Extinction and Renewal: Implications for Addiction, Relapse, and Problem Gambling

Psychological Review ◽

10.1037/0033-295x.114.3.784.supp ◽

2007 ◽

Cited By ~ 1

Keyword(s):

Reinforcement Learning ◽

Problem Gambling ◽

Learning Models ◽

Behavioral Extinction ◽

Reinforcement Learning Models

Download Full-text

Bayes factors for reinforcement-learning models of the Iowa gambling task.

Decision ◽

10.1037/dec0000040 ◽

2016 ◽

Vol 3 (2) ◽

pp. 115-131 ◽

Cited By ~ 14

Author(s):

Helen Steingroever ◽

Ruud Wetzels ◽

Eric-Jan Wagenmakers

Keyword(s):

Reinforcement Learning ◽

Iowa Gambling Task ◽

Bayes Factors ◽

Gambling Task ◽

Learning Models ◽

Reinforcement Learning Models

Download Full-text

Effects of Working Memory Capacity on the Speed and Accuracy of Learning in Reinforcement Learning Models

PsycEXTRA Dataset ◽

10.1037/e528942014-552 ◽

2014 ◽

Author(s):

Adnane Ez-Zizi ◽

Simon Farrell ◽

David Leslie

Keyword(s):

Working Memory ◽

Reinforcement Learning ◽

Working Memory Capacity ◽

Memory Capacity ◽

Learning Models ◽

Reinforcement Learning Models ◽

Speed And Accuracy

Download Full-text

Supplemental Material for Reinforcement Learning Models of Risky Choice and the Promotion of Risk-Taking by Losses Disguised as Wins in Rats

Journal of Experimental Psychology Animal Learning and Cognition ◽

10.1037/xan0000141.supp ◽

2017 ◽

Keyword(s):

Reinforcement Learning ◽

Risk Taking ◽

Risky Choice ◽

Learning Models ◽

Losses Disguised As Wins ◽

Reinforcement Learning Models

Download Full-text

Individual differences in experienced and observational decision-making illuminate interactions between reinforcement learning and declarative memory

Scientific Reports ◽

10.1038/s41598-021-85322-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Batel Yifrah ◽

Ayelet Ramaty ◽

Genela Morris ◽

Avi Mendelsohn

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Declarative Memory ◽

Contextual Information ◽

Memory Performance ◽

Relevant Information ◽

Subjective Memory ◽

Types Of Information ◽

Reinforcement Learning Models ◽

Implicit And Explicit

AbstractDecision making can be shaped both by trial-and-error experiences and by memory of unique contextual information. Moreover, these types of information can be acquired either by means of active experience or by observing others behave in similar situations. The interactions between reinforcement learning parameters that inform decision updating and memory formation of declarative information in experienced and observational learning settings are, however, unknown. In the current study, participants took part in a probabilistic decision-making task involving situations that either yielded similar outcomes to those of an observed player or opposed them. By fitting alternative reinforcement learning models to each subject, we discerned participants who learned similarly from experience and observation from those who assigned different weights to learning signals from these two sources. Participants who assigned different weights to their own experience versus those of others displayed enhanced memory performance as well as subjective memory strength for episodes involving significant reward prospects. Conversely, memory performance of participants who did not prioritize their own experience over others did not seem to be influenced by reinforcement learning parameters. These findings demonstrate that interactions between implicit and explicit learning systems depend on the means by which individuals weigh relevant information conveyed via experience and observation.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text