Influencing Reinforcement Learning through Natural Language Guidance

Interactive reinforcement learning (IRL) agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that’s either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known IRL technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep RL agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven RL agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the RL agent to better policy shaping.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

A Hybrid Neural Network Model Based Reinforcement Learning Agent

Advances in Neural Networks - ISNN 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13278-0_56 ◽

2010 ◽

pp. 436-443

Author(s):

Pengyi Gao ◽

Chuanbo Chen ◽

Kui Zhang ◽

Yingsong Hu ◽

Dan Li

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Network Model ◽

Neural Network Model ◽

Hybrid Neural Network ◽

Model Based ◽

Learning Agent

Download Full-text

LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6228 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7342-7349

Author(s):

Leonard Adolphs ◽

Thomas Hofmann

Keyword(s):

Reinforcement Learning ◽

Complex Environments ◽

Final Test ◽

Hierarchical Reinforcement Learning ◽

New Methods ◽

Learning Agent ◽

Specific Goal ◽

Restricted Game ◽

Modern House ◽

And Task

While Reinforcement Learning (RL) approaches lead to significant achievements in a variety of areas in recent history, natural language tasks remained mostly unaffected, due to the compositional and combinatorial nature that makes them notoriously hard to optimize. With the emerging field of Text-Based Games (TBGs), researchers try to bridge this gap. Inspired by the success of RL algorithms on Atari games, the idea is to develop new methods in a restricted game world and then gradually move to more complex environments. Previous work in the area of TBGs has mainly focused on solving individual games. We, however, consider the task of designing an agent that not just succeeds in a single game, but performs well across a whole family of games, sharing the same theme. In this work, we present our deep RL agent—LeDeepChef—that shows generalization capabilities to never-before-seen games of the same family with different environments and task descriptions. The agent participated in Microsoft Research's First TextWorld Problems: A Language and Reinforcement Learning Challenge and outperformed all but one competitor on the final test set. The games from the challenge all share the same theme, namely cooking in a modern house environment, but differ significantly in the arrangement of the rooms, the presented objects, and the specific goal (recipe to cook). To build an agent that achieves high scores across a whole family of games, we use an actor-critic framework and prune the action-space by using ideas from hierarchical reinforcement learning and a specialized module trained on a recipe database.

Download Full-text

Value Iteration Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/700 ◽

2017 ◽

Cited By ~ 31

Author(s):

Aviv Tamar ◽

Yi Wu ◽

Garrett Thomas ◽

Sergey Levine ◽

Pieter Abbeel

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Path Planning ◽

Natural Language ◽

Convolutional Neural Network ◽

Search Task ◽

Iteration Algorithm ◽

Continuous Path ◽

Value Iteration ◽

Value Iteration Algorithm

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867

Download Full-text

A Reinforcement-Learning Neural Network for the Control of Nonlinear Systems

1991 American Control Conference ◽

10.23919/acc.1991.4791313 ◽

1991 ◽

Cited By ~ 3

Author(s):

Kevin L. Moore

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Nonlinear Systems

Download Full-text

Fast Neural Network Engine for Natural Science Language Processing: A Drug-Search Case.

10.26434/chemrxiv.12800348 ◽

2020 ◽

Author(s):

Vadim V. Korolev ◽

Artem Mitrofanov ◽

Kirill Karpov ◽

Valery Tkachenko

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Natural Science ◽

Therapeutic Agent ◽

Semantic Relations ◽

Chemical Data ◽

Processing Methods ◽

Modern Natural

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Building HVAC Scheduling Using Reinforcement Learning via Neural Network Based Model Approximation

Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation ◽

10.1145/3360322.3360861 ◽

2019 ◽

Cited By ~ 8

Author(s):

Chi Zhang ◽

Sanmukh R. Kuppannagari ◽

Rajgopal Kannan ◽

Viktor K. Prasanna

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Model Approximation

Download Full-text

Robust Quadrotor Control through Reinforcement Learning with Disturbance Compensation

Applied Sciences ◽

10.3390/app11073257 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3257

Author(s):

Chen-Huan Pi ◽

Wei-Yuan Ye ◽

Stone Cheng

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Control Strategy ◽

External Disturbance ◽

Control Agent ◽

Network Control ◽

Outdoor Environment ◽

Disturbance Compensation ◽

Tracking Accuracy ◽

Control Scheme

In this paper, a novel control strategy is presented for reinforcement learning with disturbance compensation to solve the problem of quadrotor positioning under external disturbance. The proposed control scheme applies a trained neural-network-based reinforcement learning agent to control the quadrotor, and its output is directly mapped to four actuators in an end-to-end manner. The proposed control scheme constructs a disturbance observer to estimate the external forces exerted on the three axes of the quadrotor, such as wind gusts in an outdoor environment. By introducing an interference compensator into the neural network control agent, the tracking accuracy and robustness were significantly increased in indoor and outdoor experiments. The experimental results indicate that the proposed control strategy is highly robust to external disturbances. In the experiments, compensation improved control accuracy and reduced positioning error by 75%. To the best of our knowledge, this study is the first to achieve quadrotor positioning control through low-level reinforcement learning by using a global positioning system in an outdoor environment.

Download Full-text

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Download Full-text