Multi-Task Deep Reinforcement Learning with PopArt

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

Download Full-text

Regularized Evolution for Image Classifier Architecture Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014780 ◽

2019 ◽

Vol 33 ◽

pp. 4780-4789 ◽

Cited By ~ 206

Author(s):

Esteban Real ◽

Alok Aggarwal ◽

Yanping Huang ◽

Quoc V. Le

Keyword(s):

Neural Network ◽

Learning Algorithm ◽

State Of The Art ◽

Simple Method ◽

Current State ◽

Network Topologies ◽

Comparable Accuracy ◽

Tournament Selection ◽

Complex Architecture ◽

First Time

The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier— AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-theart 83.9% top-1 / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures.

Download Full-text

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior ◽

10.1177/105971239500400102 ◽

1995 ◽

Vol 4 (1) ◽

pp. 3-28 ◽

Cited By ~ 15

Author(s):

Mance E. Harmon ◽

Leemon C. Baird ◽

A. Harry Klopf

Keyword(s):

Reinforcement Learning ◽

Differential Game ◽

Learning Algorithm ◽

Learning System ◽

Test Bed ◽

Linear Quadratic ◽

Time Step ◽

Q Learning ◽

Step Duration ◽

Markov Decision

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Download Full-text

A Pre-Trained Fuzzy Reinforcement Learning Method for the Pursuing Satellite in a One-to-One Game in Space

Sensors ◽

10.3390/s20082253 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2253

Author(s):

Xiao Wang ◽

Peng Shi ◽

Yushan Zhao ◽

Yue Sun

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Control Policy ◽

Descent Method ◽

Gradient Descent Method ◽

One To One ◽

Inference Systems ◽

First Time

In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.

Download Full-text

Parameterless-Growing-SOM and Its Application to a Voice Instruction Learning System

Journal of Robotics ◽

10.1155/2010/307293 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Takashi Kuremoto ◽

Takahito Komoto ◽

Kunikazu Kobayashi ◽

Masanao Obayashi

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Learning System ◽

Automatic Tuning ◽

Self Organizing Map ◽

Multiple Languages ◽

Voice Instruction ◽

Japanese English ◽

Neighborhood Preservation ◽

Self Organizing

An improved self-organizing map (SOM), parameterless-growing-SOM (PL-G-SOM), is proposed in this paper. To overcome problems existed in traditional SOM (Kohonen, 1982), kinds of structure-growing-SOMs or parameter-adjusting-SOMs have been invented and usually separately. Here, we combine the idea of growing SOMs (Bauer and Villmann, 1997; Dittenbach et al. 2000) and a parameterless SOM (Berglund and Sitte, 2006) together to be a novel SOM named PL-G-SOM to realize additional learning, optimal neighborhood preservation, and automatic tuning of parameters. The improved SOM is applied to construct a voice instruction learning system for partner robots adopting a simple reinforcement learning algorithm. User's instructions of voices are classified by the PL-G-SOM at first, then robots choose an expected action according to a stochastic policy. The policy is adjusted by the reward/punishment given by the user of the robot. A feeling map is also designed to express learning degrees of voice instructions. Learning and additional learning experiments used instructions in multiple languages including Japanese, English, Chinese, and Malaysian confirmed the effectiveness of our proposed system.

Download Full-text

Deep Reinforcement Learning on HVAC Control

Information Technology and Management Science ◽

10.7250/itms-2018-0004 ◽

2018 ◽

Vol 21 ◽

pp. 29-36 ◽

Cited By ~ 1

Author(s):

Ivars Namatēvs

Keyword(s):

Reinforcement Learning ◽

Predictive Control ◽

Learning Algorithm ◽

State Of The Art ◽

Building Energy ◽

Computing Power ◽

Smart Building ◽

Q Learning ◽

Sensory Inputs ◽

Q Function

Due to increase of computing power and innovative approaches of an end-to-end reinforcement learning (RL) that feed data from high-dimensional sensory inputs, it is now plausible to combine RL and Deep learning to perform Smart Building Energy Control (SBEC) systems. Deep reinforcement learning (DRL) revolutionizes existing Q-learning algorithm to Deep Q-learning (DQL) profited by artificial neural networks. Deep Neural Network (DNN) is well trained to calculate the Q-function. To create comprehensive SBEC system it is crucial to choose appropriate mathematical background and benchmark the best framework of a model based predictive control to manage the building heating, ventilation, and air condition (HVAC) system. The main contribution of this paper is to explore a state-of-the-art DRL methodology to smart building control.

Download Full-text

AutoFolio: An Automatically Configured Algorithm Selector (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/715 ◽

2017 ◽

Cited By ~ 4

Author(s):

Marius Lindauer ◽

Frank Hutter ◽

Holger H. Hoos ◽

Torsten Schaub

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Problem Instance ◽

Algorithm Selection ◽

Algorithm Configuration ◽

Optimal Values ◽

Art Performance ◽

The One

Algorithm selection (AS) techniques -- which involve choosing from a set of algorithms the one expected to solve a given problem instance most efficiently -- have substantially improved the state of the art in solving many prominent AI problems, such as SAT, CSP, ASP, MAXSAT and QBF. Although several AS procedures have been introduced, not too surprisingly, none of them dominates all others across all AS scenarios. Furthermore, these procedures have parameters whose optimal values vary across AS scenarios. In this extended abstract of our 2015 JAIR article of the same title, we summarize AutoFolio, which uses an algorithm configuration procedure to automatically select an AS approach and optimize its parameters for a given AS scenario. AutoFolio allows researchers and practitioners across a broad range of applications to exploit the combined power of many different AS methods and to automatically construct high-performance algorithm selectors. We demonstrate that AutoFolio was able to produce new state-of-the-art algorithm selectors for 7 well-studied AS scenarios and matches state-of-the-art performance statistically on all other scenarios. Compared to the best single algorithm for each AS scenario, AutoFolio achieved average speedup factors between 1.3 and 15.4.

Download Full-text

The Successful Ingredients of Policy Gradient Algorithms

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/338 ◽

2021 ◽

Author(s):

Sven Gronauer ◽

Martin Gottwald ◽

Klaus Diepold

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Gradient Methods ◽

Gradient Algorithms ◽

The Sublime ◽

Policy Gradient ◽

Art Performance ◽

Underlying Mechanisms

Despite the sublime success in recent years, the underlying mechanisms powering the advances of reinforcement learning are yet poorly understood. In this paper, we identify these mechanisms - which we call ingredients - in on-policy policy gradient methods and empirically determine their impact on the learning. To allow an equitable assessment, we conduct our experiments based on a unified and modular implementation. Our results underline the significance of recent algorithmic advances and demonstrate that reaching state-of-the-art performance may not need sophisticated algorithms but can also be accomplished by the combination of a few simple ingredients.

Download Full-text

AGGREGATION OF MULTIPLE REINFORCEMENT LEARNING ALGORITHMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213006002990 ◽

2006 ◽

Vol 15 (05) ◽

pp. 855-861 ◽

Cited By ~ 4

Author(s):

JU JIANG ◽

MOHAMED S. KAMEL ◽

LEI CHEN

Keyword(s):

Reinforcement Learning ◽

High Performance ◽

Learning Algorithm ◽

Learning Algorithms ◽

Individual Learning ◽

Learning Curves ◽

Learning System ◽

Sequential Decisions ◽

Simulation Results ◽

Pole System

Reinforcement learning (RL) has been successfully used in many fields. With the increasing complexity of environments and tasks, it is difficult for a single learning algorithm to cope with complicated problems with high performance. This paper proposes a new multiple learning architecture, "Aggregated Multiple Reinforcement Learning System (AMRLS)", which aggregates different RL algorithms in each learning step to make more appropriate sequential decisions than those made by individual learning algorithms. This architecture was tested on a Cart-Pole system. The presented simulation results confirm our prediction and reveal that aggregation not only provides robustness and fault tolerance ability, but also produces more smooth learning curves and needs fewer learning steps than individual learning algorithms.

Download Full-text

Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/196 ◽

2017 ◽

Cited By ~ 5

Author(s):

Aijun Bai ◽

Stuart Russell

Keyword(s):

Reinforcement Learning ◽

Hierarchical Structure ◽

Learning Algorithm ◽

State Of The Art ◽

State Machines ◽

Learning To Learn ◽

Hierarchical Reinforcement Learning ◽

Abstract Machines ◽

Finite State ◽

Q Values

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

Download Full-text

Research on decision-making strategy of soccer robot based on multi-agent reinforcement learning

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420916960 ◽

2020 ◽

Vol 17 (3) ◽

pp. 172988142091696

Author(s):

Xiaoli Liu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Division Of Labour ◽

Learning System ◽

Selection Strategy ◽

Multi Agent System ◽

Soccer Robot ◽

Agent System ◽

Multi Agent

This article studies a multi-agent reinforcement learning algorithm based on agent action prediction. In multi-agent system, the action of learning agent selection is inevitably affected by the action of other agents, so the reinforcement learning system needs to consider the joint state and joint action of multi-agent based on this. In addition, the application of this method in the cooperative strategy learning of soccer robot is studied, so that the multi-agent system can pass through the environment. To realize the division of labour and cooperation of multi-robots, the interactive learning is used to master the behaviour strategy. Combined with the characteristics of decision-making of soccer robot, this article analyses the role transformation and experience sharing of multi-agent reinforcement learning, and applies it to the local attack strategy of soccer robot, uses this algorithm to learn the action selection strategy of the main robot in the team, and uses Matlab platform for simulation verification. The experimental results prove the effectiveness of the research method, and the superiority of the proposed method is validated compared with some simple methods.

Download Full-text