One-shot learning and behavioral eligibility traces in sequential decision making

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Traded Control of Human-Machine Systems for Sequential Decision-Making Based on Reinforcement Learning

IEEE Transactions on Artificial Intelligence ◽

10.1109/tai.2021.3127857 ◽

2021 ◽

pp. 1-1

Author(s):

Qianqian Zhang ◽

Yu Kang ◽

Yun-Bo Zhao ◽

Pengfei Li ◽

Shiyi You

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sequential Decision Making ◽

Sequential Decision ◽

Machine Systems

Download Full-text

Strategic Tasks for Explainable Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110007 ◽

2019 ◽

Vol 33 ◽

pp. 10007-10008 ◽

Cited By ~ 1

Author(s):

Rey Pocius ◽

Lawrence Neal ◽

Alan Fern

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Environment ◽

Strategic Decision ◽

Strategic Decision Making ◽

Sequential Decision Making ◽

Sequential Decision ◽

Level Control ◽

Mini Games ◽

High Level

Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.

Download Full-text

A reinforcement learning approach for sequential decision-making process of attacks in smart grid

2017 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2017.8285291 ◽

2017 ◽

Cited By ~ 9

Author(s):

Zhen Ni ◽

Shuva Paul ◽

Xiangnan Zhong ◽

Qinglai Wei

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Smart Grid ◽

Sequential Decision Making ◽

Decision Making Process ◽

Learning Approach ◽

Sequential Decision

Download Full-text

Imaginative Reinforcement Learning: Computational Principles and Neural Mechanisms

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01170 ◽

2017 ◽

Vol 29 (12) ◽

pp. 2103-2113 ◽

Cited By ~ 8

Author(s):

Samuel J. Gershman ◽

Jimmy Zhou ◽

Cody Kommers

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Internal Model ◽

Learning Model ◽

Neural Mechanisms ◽

Sequential Decision Making ◽

Sequential Decision ◽

The Impact ◽

To Receive ◽

Reinforcement Learning Model

Imagination enables us not only to transcend reality but also to learn about it. In the context of reinforcement learning, an agent can rationally update its value estimates by simulating an internal model of the environment, provided that the model is accurate. In a series of sequential decision-making experiments, we investigated the impact of imaginative simulation on subsequent decisions. We found that imagination can cause people to pursue imagined paths, even when these paths are suboptimal. This bias is systematically related to participants' optimism about how much reward they expect to receive along imagined paths; providing feedback strongly attenuates the effect. The imagination effect can be captured by a reinforcement learning model that includes a bonus added onto imagined rewards. Using fMRI, we show that a network of regions associated with valuation is predictive of the imagination effect. These results suggest that imagination, although a powerful tool for learning, is also susceptible to motivational biases.

Download Full-text

Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey

10.36227/techrxiv.14679504.v2 ◽

2021 ◽

Author(s):

Amjad Yousef Majid ◽

Serge Saaybi ◽

Tomas van Rietbergen ◽

Vincent Francois-Lavet ◽

R Venkatesha Prasad ◽

...

Keyword(s):

Reinforcement Learning ◽

Evolution Strategies ◽

Sequential Decision Making ◽

Sequential Decision ◽

Level Control ◽

Agent Learning ◽

Real World Applications ◽

Multi Agent ◽

Comparative Survey ◽

Key Aspects

<div>Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist.</div><div>To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided. </div><div>After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects such as scalability, exploration, adaptation to dynamic environments, and multi-agent learning. </div><div>Then, the benefits of hybrid algorithms that combine concepts from DRL and ESs are highlighted. </div><div>Finally, to have an indication about how they compare in real-world applications, a survey of the literature for the set of applications they support is provided.</div>

Download Full-text

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018393 ◽

2019 ◽

Vol 33 ◽

pp. 8393-8400 ◽

Cited By ~ 8

Author(s):

Dongliang He ◽

Xiang Zhao ◽

Jizhou Huang ◽

Fu Li ◽

Xiao Liu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Natural Language ◽

State Of The Art ◽

Sliding Window ◽

Sequential Decision Making ◽

Sequential Decision ◽

Boundary Information ◽

Performance Gains ◽

Steady Performance

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.

Download Full-text

A Novel Heterogeneous Swarm Reinforcement Learning Method for Sequential Decision Making Problems

Machine Learning and Knowledge Extraction ◽

10.3390/make1020035 ◽

2019 ◽

Vol 1 (2) ◽

pp. 590-610

Author(s):

Zohreh Akbari ◽

Rainer Unland

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Single Agent ◽

Sequential Decision Making ◽

Multi Agent Systems ◽

Sequential Decision ◽

Agent Systems ◽

Novel Approach ◽

Markov Decision ◽

Multi Agent

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.

Download Full-text

Distributed reinforcement learning for sequential decision making

Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997) ◽

10.1109/icif.2002.1020958 ◽

2003 ◽

Cited By ~ 7

Author(s):

G. Rogova ◽

P. Scott ◽

C. Lolett

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sequential Decision Making ◽

Sequential Decision ◽

Distributed Reinforcement

Download Full-text

Choice-selective sequences dominate in cortical relative to thalamic inputs to nucleus accumbens, providing a potential substrate for credit assignment

10.1101/725382 ◽

2019 ◽

Cited By ~ 2

Author(s):

Nathan F. Parker ◽

Avinash Baidya ◽

Julia Cox ◽

Laura Haetzel ◽

Anna Zhukovskaya ◽

...

Keyword(s):

Reinforcement Learning ◽

Nucleus Accumbens ◽

Learning Task ◽

Temporal Difference ◽

Prelimbic Cortex ◽

Temporal Difference Learning ◽

Credit Assignment ◽

Cortical Inputs ◽

Selective Activity ◽

Potential Substrate

How are actions linked with subsequent outcomes to guide choices? The nucleus accumbens, which is implicated in this process, receives glutamatergic inputs from the prelimbic cortex and midline regions of the thalamus. However, little is known about what is represented in these input pathways. By comparing these inputs during a reinforcement learning task in mice, we discovered that prelimbic cortical inputs preferentially represent actions and choices, whereas midline thalamic inputs preferentially represent cues. Choice-selective activity in the prelimbic cortical inputs is organized in sequences that persist beyond the outcome. Through computational modeling, we demonstrate that these sequences can support the neural implementation of temporal difference learning, a powerful algorithm to connect actions and outcomes across time. Finally, we test and confirm predictions of our circuit model by direct manipulation of nucleus accumbens input neurons. Thus, we integrate experiment and modeling to suggest a neural solution for credit assignment.

Download Full-text