single agent learning
Recently Published Documents

This work presents a sample efficient and effective value-based method, named SMIX(λ), for reinforcement learning in multi-agent environments (MARL) within the paradigm of centralized training with decentralized execution (CTDE), in which learning a stable and generalizable centralized value function (CVF) is crucial. To achieve this, our method carefully combines different elements, including 1) removing the unrealistic centralized greedy assumption during the learning phase, 2) using the λ-return to balance the trade-off between bias and variance and to deal with the environment's non-Markovian property, and 3) adopting an experience-replay style off-policy training. Interestingly, it is revealed that there exists inherent connection between SMIX(λ) and previous off-policy Q(λ) approach for single-agent learning. Experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark show that the proposed SMIX(λ) algorithm outperforms several state-of-the-art MARL methods by a large margin, and that it can be used as a general tool to improve the overall performance of a CTDE-type method by enhancing the evaluation quality of its CVF. We open-source our code at: https://github.com/chaovven/SMIX.

Download Full-text

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/581 ◽

2019 ◽

Author(s):

Kenny Young ◽

Baoxiang Wang ◽

Matthew E. Taylor

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Single Agent ◽

Nonlinear Function ◽

Step Size ◽

Experience Replay ◽

Training Examples ◽

The Cost ◽

Single Agent Learning ◽

Nonlinear Function Approximation

Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is commonly required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised setting. However, they come at the cost of moving away from the RL problem as it is typically formulated, that is, a single agent learning online without maintaining a large database of training examples. To address these issues, we propose Metatrace, a meta-gradient descent based algorithm to tune the step-size online. Metatrace leverages the structure of eligibility traces, and works for both tuning a scalar step-size and a respective step-size for each parameter. We empirically evaluate Metatrace for actor-critic on the Arcade Learning Environment. Results show Metatrace can speed up learning, and improve performance in non-stationary settings.

Download Full-text

Single Agent Learning Algorithms for Decision making in Diagnostic Applications

International Journal of Computer Science and Engineering ◽

10.14445/23488387/ijcse-v3i5p109 ◽

2016 ◽

Vol 3 (5) ◽

pp. 46-52 ◽

Cited By ~ 2

Author(s):

Deepak A. Vidhate ◽

Keyword(s):

Decision Making ◽

Learning Algorithms ◽

Single Agent ◽

Agent Learning ◽

Diagnostic Applications ◽

Single Agent Learning

Download Full-text

Evolution of Cartesian Genetic Programs for Development of Learning Neural Architecture

Evolutionary Computation ◽

10.1162/evco_a_00043 ◽

2011 ◽

Vol 19 (3) ◽

pp. 469-523 ◽

Cited By ~ 15

Author(s):

Gul Muhammad Khan ◽

Julian F. Miller ◽

David M. Halliday

Keyword(s):

Genetic Programming ◽

Random Network ◽

Single Agent ◽

Learning Potential ◽

Neural Learning ◽

Agent Learning ◽

Learning Capabilities ◽

Intelligent Behavior ◽

Interesting Learning ◽

Single Agent Learning

Although artificial neural networks have taken their inspiration from natural neurological systems, they have largely ignored the genetic basis of neural functions. Indeed, evolutionary approaches have mainly assumed that neural learning is associated with the adjustment of synaptic weights. The goal of this paper is to use evolutionary approaches to find suitable computational functions that are analogous to natural sub-components of biological neurons and demonstrate that intelligent behavior can be produced as a result of this additional biological plausibility. Our model allows neurons, dendrites, and axon branches to grow or die so that synaptic morphology can change and affect information processing while solving a computational problem. The compartmental model of a neuron consists of a collection of seven chromosomes encoding distinct computational functions inside the neuron. Since the equivalent computational functions of neural components are very complex and in some cases unknown, we have used a form of genetic programming known as Cartesian genetic programming (CGP) to obtain these functions. We start with a small random network of soma, dendrites, and neurites that develops during problem solving by repeatedly executing the seven chromosomal programs that have been found by evolution. We have evaluated the learning potential of this system in the context of a well-known single agent learning problem, known as Wumpus World. We also examined the harder problem of learning in a competitive environment for two antagonistic agents, in which both agents are controlled by independent CGP computational networks (CGPCN). Our results show that the agents exhibit interesting learning capabilities.

Download Full-text

single agent learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Single Agent Learning Algorithms for Decision making in Diagnostic Applications

Evolution of Cartesian Genetic Programs for Development of Learning Neural Architecture

single agent learning
Recently Published Documents