Grounding Language for Transfer in Deep Reinforcement Learning

In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.

Download Full-text

An Experimental Study on State Representation Extraction for Vision-Based Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app112110337 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10337

Author(s):

Junkai Ren ◽

Yujun Zeng ◽

Sihang Zhou ◽

Yichuan Zhang

Keyword(s):

Experimental Study ◽

Reinforcement Learning ◽

Network Architecture ◽

Representation Learning ◽

Evaluation Metrics ◽

High Dimensional ◽

Regularization Methods ◽

Challenging Problem ◽

State Representation ◽

Sample Quality

Scaling end-to-end learning to control robots with vision inputs is a challenging problem in the field of deep reinforcement learning (DRL). While achieving remarkable success in complex sequential tasks, vision-based DRL remains extremely data-inefficient, especially when dealing with high-dimensional pixels inputs. Many recent studies have tried to leverage state representation learning (SRL) to break through such a barrier. Some of them could even help the agent learn from pixels as efficiently as from states. Reproducing existing work, accurately judging the improvements offered by novel methods, and applying these approaches to new tasks are vital for sustaining this progress. However, the demands of these three aspects are seldom straightforward. Without significant criteria and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the previous methods are meaningful. For this reason, we conducted ablation studies on hyperparameters, embedding network architecture, embedded dimension, regularization methods, sample quality and SRL methods to compare and analyze their effects on representation learning and reinforcement learning systematically. Three evaluation metrics are summarized, including five baseline algorithms (including both value-based and policy-based methods) and eight tasks are adopted to avoid the particularity of each experiment setting. We highlight the variability in reported methods and suggest guidelines to make future results in SRL more reproducible and stable based on a wide number of experimental analyses. We aim to spur discussion about how to assure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

Download Full-text

What is dopamine doing in model-based reinforcement learning?

10.31234/osf.io/z2fmw ◽

2020 ◽

Author(s):

Thomas Akam ◽

Mark Walton

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Dopamine Neurons ◽

State Representation ◽

Dopaminergic Activity ◽

Model Based ◽

Model Free ◽

Reward Prediction ◽

Predictive State Representation ◽

Teaching Signal

Experiments have implicated dopamine in model-based reinforcement learning (RL). These findings are unexpected as dopamine is thought to encode a reward prediction error (RPE), which is the key teaching signal in model-free RL. Here we examine two possible accounts for dopamine’s involvement in model-based RL: the first that dopamine neurons carry a prediction error used to update a type of predictive state representation called a successor representation, the second that two well established aspects of dopaminergic activity, RPEs and surprise signals, can together explain dopamine’s involvement in model-based RL.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Evaluating Cross-Country Policy Learning in Public Administration: The Case of Qatar-Singapore Partnership in Regional Training Centre for Public Administration (RTCPA)

10.31124/advance.7963694 ◽

2019 ◽

Author(s):

M. Evren Tok ◽

Duygu Sever

Keyword(s):

Middle East ◽

Public Administration ◽

Policy Transfer ◽

Policy Learning ◽

Training Center ◽

Training Centre ◽

Cross Country

This study investigates the case of Qatar Singapore Regional Training Center for Public Administration.As a tool for this process of policy transfer, the article further evaluates the case of Singapore- Qatar Asia-Middle East Dialogue (AMED) Regional Training Centre for Public Administration (RTCPA) in Doha, Qatar, as a mechanism to foster this policy transferThe study suggests that this evaluation would be a fruitful example in revealing the strengths and weakness of such initiatives and can offer a scheme for insights regarding effective tools of policy learning.

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Model-Free Event-Triggered Optimal Consensus Control of Multiple Euler-Lagrange Systems via Reinforcement Learning

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2020.3036604 ◽

2020 ◽

pp. 1-1

Author(s):

Saiwei Wang ◽

Xin Jin ◽

Shuai Mao ◽

Athanasios V. Vasilakos ◽

Yang Tang

Keyword(s):

Reinforcement Learning ◽

Consensus Control ◽

Model Free ◽

Event Triggered

Download Full-text

Computation offloading through mobile vehicles in IoT-edge-cloud network

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01848-5 ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Jun Long ◽

Yueyi Luo ◽

Xiaoyu Zhu ◽

Entao Luo ◽

Mingfeng Huang

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Low Cost ◽

Computation Offloading ◽

Mobile Edge Computing ◽

Challenging Problem ◽

Learning Technique ◽

Cloud Network ◽

The City ◽

Task Offloading

AbstractWith the developing of Internet of Things (IoT) and mobile edge computing (MEC), more and more sensing devices are widely deployed in the smart city. These sensing devices generate various kinds of tasks, which need to be sent to cloud to process. Usually, the sensing devices do not equip with wireless modules, because it is neither economical nor energy saving. Thus, it is a challenging problem to find a way to offload tasks for sensing devices. However, many vehicles are moving around the city, which can communicate with sensing devices in an effective and low-cost way. In this paper, we propose a computation offloading scheme through mobile vehicles in IoT-edge-cloud network. The sensing devices generate tasks and transmit the tasks to vehicles, then the vehicles decide to compute the tasks in the local vehicle, MEC server or cloud center. The computation offloading decision is made based on the utility function of the energy consumption and transmission delay, and the deep reinforcement learning technique is adopted to make decisions. Our proposed method can make full use of the existing infrastructures to implement the task offloading of sensing devices, the experimental results show that our proposed solution can achieve the maximum reward and decrease delay.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text