Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem

Standard associative learning theories typically fail to conceptualise the temporal properties of a stimulus, and hence cannot easily make predictions about the effects such properties might have on the magnitude of conditioning phenomena. Despite this, in intuitive terms we might expect that the temporal properties of a stimulus that is paired with some outcome to be important. In particular, there is no previous research addressing the way that fixed or variable duration stimuli can affect overshadowing. In this chapter we report results which show that the degree of overshadowing depends on the distribution form - fixed or variable - of the overshadowing stimulus, and argue that conditioning is weaker under conditions of temporal uncertainty. These results are discussed in terms of models of conditioning and timing. We conclude that the temporal difference model, which has been extensively applied to the reinforcement learning problem in machine learning, accounts for the key findings of our study.

Download Full-text

Evolving Equilibrium Policies for a Multiagent Reinforcement Learning Problem with State Attractors

Computational Collective Intelligence. Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23938-0_21 ◽

2011 ◽

pp. 201-210 ◽

Cited By ~ 1

Author(s):

Florin Leon

Keyword(s):

Reinforcement Learning ◽

Learning Problem ◽

Multiagent Reinforcement Learning

Download Full-text

Learning a Belief Representation for Delayed Reinforcement Learning

10.1109/ijcnn52387.2021.9534358 ◽

2021 ◽

Author(s):

Pierre Liotet ◽

Erick Venneri ◽

Marcello Restelli

Keyword(s):

Reinforcement Learning ◽

Delayed Reinforcement

Download Full-text

An Introduction to Intertask Transfer for Reinforcement Learning

AI Magazine ◽

10.1609/aimag.v32i1.2329 ◽

2011 ◽

Vol 32 (1) ◽

pp. 15 ◽

Cited By ~ 18

Author(s):

Matthew E. Taylor ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Problem ◽

Open Problems ◽

Learning Framework ◽

Learning Domains ◽

Multiple Tasks ◽

Exciting Area ◽

Generalize Information ◽

Selection Of

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difﬁcult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.

Download Full-text

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Computational Intelligence and Neuroscience ◽

10.1155/2021/9945044 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Xiaogang Ruan ◽

Peng Li ◽

Xiaoqing Zhu ◽

Hejie Yu ◽

Naigong Yu

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Driving Forces ◽

Temporal Distance ◽

Training Methods ◽

Complex Environments ◽

Learning Problem ◽

Autonomous Exploration ◽

Exploration Behavior ◽

Efficient Exploration

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

Download Full-text

Configurable Environments in Reinforcement Learning: An Overview

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_9 ◽

2022 ◽

pp. 101-113

Author(s):

Alberto Maria Metelli

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Process ◽

Real World ◽

Decision Processes ◽

Learning Problem ◽

Complex Control ◽

Control Frequency ◽

Markov Decision ◽

And Control

AbstractReinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.

Download Full-text

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6417 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8878-8885

Author(s):

Haoyu Song ◽

Wei-Nan Zhang ◽

Jingwen Hu ◽

Ting Liu

Keyword(s):

Reinforcement Learning ◽

Natural Language ◽

Experimental Results ◽

Learning Problem ◽

Model Based ◽

Consistency Evaluation

Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.

Download Full-text

Delayed reinforcement learning

Handbook of Neural Computation ◽

10.1887/0750303123/b365c61 ◽

2004 ◽

Author(s):

S Sathiya Keerthi ◽

B Ravindran

Keyword(s):

Reinforcement Learning ◽

Delayed Reinforcement

Download Full-text

Proposal of PSwithEFP and its Evaluation in Multi-Agent Reinforcement Learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0930 ◽

2017 ◽

Vol 21 (5) ◽

pp. 930-938 ◽

Cited By ~ 3

Author(s):

Kazuteru Miyazaki ◽

Koudai Furukawa ◽

Hiroaki Kobayashi ◽

◽

...

Keyword(s):

Reinforcement Learning ◽

Failure Probability ◽

Action Selection ◽

New Method ◽

Selection Strategy ◽

Multiple Agents ◽

Learning Problem ◽

Concurrent Learning ◽

Multi Agent

When multiple agents learn a task simultaneously in an environment, the learning results often become unstable. This problem is known as the concurrent learning problem and to date, several methods have been proposed to resolve it. In this paper, we propose a new method that incorporates expected failure probability (EFP) into the action selection strategy to give agents a kind of mutual adaptability. The effectiveness of the proposed method is confirmed using Keepaway task.

Download Full-text

Consistency of HDP applied to a simple reinforcement learning problem

Neural Networks ◽

10.1016/0893-6080(90)90088-3 ◽

1990 ◽

Vol 3 (2) ◽

pp. 179-189 ◽

Cited By ~ 125

Author(s):

Paul J. Werbos

Keyword(s):

Reinforcement Learning ◽

Learning Problem

Download Full-text