ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Rohan Saphal; Balaraman Ravindran; Dheevatsa Mudigere; Sasikanth Avancha; Bharat Kaul

doi:10.1609/aaai.v34i10.7225

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7225 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13905-13906

Author(s):

Rohan Saphal ◽

Balaraman Ravindran ◽

Dheevatsa Mudigere ◽

Sasikanth Avancha ◽

Bharat Kaul

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Multiple Models ◽

Model Parameters ◽

Continuous Control ◽

Sample Complexity ◽

Local Minima ◽

Single Model ◽

Learning Policies ◽

Reinforcement Learning Models

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

A review of forecasting techniques for large datasets

National Institute Economic Review ◽

10.1177/0027950108089682 ◽

2008 ◽

Vol 203 ◽

pp. 109-115 ◽

Cited By ~ 4

Author(s):

Jana Eklund ◽

George Kapetanios

Keyword(s):

State Of The Art ◽

Large Data ◽

Large Datasets ◽

Large Data Sets ◽

Multiple Models ◽

Small Subset ◽

Data Sets ◽

Single Model ◽

Data Set ◽

Forecasting Techniques

This paper aims to provide a brief and relatively non-technical overview of state-of-the-art forecasting with large data sets. We classify existing methods into four groups depending on whether data sets are used wholly or partly, whether a single model or multiple models are used and whether a small subset or the whole data set is being forecast. In particular, we provide brief descriptions of the methods and short recommendations where appropriate, without going into detailed discussions of their merits or demerits.

Download Full-text

Skill-based curiosity for intrinsically motivated reinforcement learning

Machine Learning ◽

10.1007/s10994-019-05845-8 ◽

2019 ◽

Vol 109 (3) ◽

pp. 493-512 ◽

Cited By ~ 2

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Skill Learning ◽

High Dimensional ◽

Sequential Decision ◽

Learning Methods ◽

Reward Function ◽

Intrinsic Reward ◽

Reinforcement Learning Models ◽

Data Efficiency

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.

Download Full-text

What do reinforcement learning models measure? Interpreting model parameters in cognition and neuroscience

Current Opinion in Behavioral Sciences ◽

10.1016/j.cobeha.2021.06.004 ◽

2021 ◽

Vol 41 ◽

pp. 128-137

Author(s):

Maria K Eckstein ◽

Linda Wilbrecht ◽

Anne GE Collins

Keyword(s):

Reinforcement Learning ◽

Model Parameters ◽

Learning Models ◽

Reinforcement Learning Models

Download Full-text

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/197 ◽

2018 ◽

Cited By ~ 7

Author(s):

Patryk Chrabąszcz ◽

Ilya Loshchilov ◽

Frank Hutter

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Evolution Strategies ◽

Learning Problems ◽

Local Minima ◽

Natural Evolution ◽

The Many ◽

Made In

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades.We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.

Download Full-text

Playing Atari with Six Neurons (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/651 ◽

2020 ◽

Author(s):

Giuseppe Cuccu ◽

Julian Togelius ◽

Philippe Cudré-Mauroux

Keyword(s):

Reinforcement Learning ◽

Vector Quantization ◽

Sparse Coding ◽

Deep Neural Network ◽

State Of The Art ◽

Compact State ◽

Learning Policies ◽

Novel Algorithms ◽

Over Time ◽

Selection Of

Deep reinforcement learning applied to vision-based problems like Atari games maps pixels directly to actions; internally, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it. By separating image processing from decision-making, one could better understand the complexity of each task, as well as potentially find smaller policy representations that are easier for humans to understand and may generalize better. To this end, we propose a new method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning. State representations are generated by an encoder based on two novel algorithms: Increasing Dictionary Vector Quantization makes the encoder capable of growing its dictionary size over time, to address new observations; and Direct Residuals Sparse Coding encodes observations by aiming for highest information inclusion. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on the game's controls). These are still capable of achieving results comparable---and occasionally superior---to state-of-the-art techniques which use two orders of magnitude more neurons.

Download Full-text

Object-sensitive Deep Reinforcement Learning

10.29007/xtgm ◽

2018 ◽

Author(s):

Yuezhang Li ◽

Katia Sycara ◽

Rahul Iyer

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Robot Navigation ◽

Learning Models ◽

New Approach ◽

Learning Agents ◽

Saliency Maps ◽

Novel Method ◽

Reinforcement Learning Models ◽

Learning Frameworks

Deep reinforcement learning has become popular over recent years, showing superiority on different visual-input tasks such as playing Atari games and robot navigation. Although objects are important image elements, few work considers enhancing deep reinforcement learning with object characteristics. In this paper, we propose a novel method that can incorporate object recognition processing to deep reinforcement learning models. This approach can be adapted to any existing deep reinforcement learning frameworks. State-of-the-art results are shown in experiments on Atari games. We also propose a new approach called “object saliency maps” to visually explain the actions made by deep reinforcement learning agents.

Download Full-text

Reinforcement Learning with Perturbed Rewards

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6086 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6202-6209 ◽

Cited By ~ 2

Author(s):

Jingkang Wang ◽

Yang Liu ◽

Bo Li

Keyword(s):

Reinforcement Learning ◽

Gaussian Noise ◽

State Of The Art ◽

Confusion Matrix ◽

Error Rates ◽

Average Score ◽

Sample Complexity ◽

Noisy Environments ◽

The Core ◽

Made In

Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e.g., zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.

Download Full-text

Solving Continuous Control with Episodic Memory

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/365 ◽

2021 ◽

Author(s):

Igor Kuznetsov ◽

Andrey Filchenkov

Keyword(s):

Reinforcement Learning ◽

Episodic Memory ◽

Data Structures ◽

State Of The Art ◽

Learning Algorithms ◽

Continuous Control ◽

Improve Performance ◽

The Past ◽

Model Free ◽

Discrete Action

Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.

Download Full-text

Control of Shared Energy Storage Assets Within Building Clusters Using Reinforcement Learning

Volume 2A: 44th Design Automation Conference ◽

10.1115/detc2018-86094 ◽

2018 ◽

Cited By ~ 2

Author(s):

Philip Odonkor ◽

Kemper Lewis

Keyword(s):

Reinforcement Learning ◽

Energy Storage ◽

State Of The Art ◽

Continuous Control ◽

Battery System ◽

Current State ◽

Policy Gradient ◽

Energy Assets ◽

The Impact ◽

Continuous Domain

This work leverages the current state of the art in reinforcement learning for continuous control, the Deep Deterministic Policy Gradient (DDPG) algorithm, towards the optimal 24-hour dispatch of shared energy assets within building clusters. The modeled DDPG agent interacts with a battery environment, designed to emulate a shared battery system. The aim here is to not only learn an efficient charged/discharged policy, but to also address the continuous domain question of how much energy should be charged or discharged. Experimentally, we examine the impact of the learned dispatch strategy towards minimizing demand peaks within the building cluster. Our results show that across the variety of building cluster combinations studied, the algorithm is able to learn and exploit energy arbitrage, tailoring it into battery dispatch strategies for peak demand shifting.

Download Full-text