Human–agent transfer from observations

The Knowledge Engineering Review ◽

10.1017/s0269888920000387 ◽

2020 ◽

Vol 36 ◽

Author(s):

Bikramjit Banerjee ◽

Sneha Racharla

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Recent Work ◽

Inverse Dynamics ◽

Decision Function ◽

Dynamics Model ◽

Human Agent ◽

Baseline Algorithm ◽

Continuous Actions ◽

Human Demonstration

Abstract Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.

Download Full-text

Exploring the Task Cooperation in Multi-goal Visual Navigation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/86 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yuechen Wu ◽

Zhenhuan Rao ◽

Wei Zhang ◽

Shijian Lu ◽

Weizhi Lu ◽

...

Keyword(s):

Reinforcement Learning ◽

Inverse Dynamics ◽

State Of The Art ◽

Visual Navigation ◽

Dynamics Model ◽

Navigation Task ◽

Additional Training ◽

Learning Scheme ◽

Positive Experiences ◽

Video Demonstration

Learning to adapt to a series of different goals in visual navigation is challenging. In this work, we present a model-embedded actor-critic architecture for the multi-goal visual navigation task. To enhance the task cooperation in multi-goal learning, we introduce two new designs to the reinforcement learning scheme: inverse dynamics model (InvDM) and multi-goal co-learning (MgCl). Specifically, InvDM is proposed to capture the navigation-relevant association between state and goal, and provide additional training signals to relieve the sparse reward issue. MgCl aims at improving the sample efficiency and supports the agent to learn from unintentional positive experiences. Extensive results on the interactive platform AI2-THOR demonstrate that the proposed method converges faster than state-of-the-art methods while producing more direct routes to navigate to the goal. The video demonstration is available at: https://youtube.com/channel/UCtpTMOsctt3yPzXqe_JMD3w/videos.

Download Full-text

An efficient reinforcement learning algorithm for continuous actions

2013 25th Chinese Control and Decision Conference (CCDC) ◽

10.1109/ccdc.2013.6560898 ◽

2013 ◽

Author(s):

Fu Bo ◽

Chen Xin ◽

He Yong ◽

Wu Min

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Continuous Actions ◽

Reinforcement Learning Algorithm

Download Full-text

Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment

10.1145/3426020.3426077 ◽

2020 ◽

Author(s):

Thanh Cong Do ◽

Hyung Jeong Yang ◽

Seok Bong Yoo ◽

In-Jae Oh

Keyword(s):

Reinforcement Learning ◽

Supervised Learning

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

Control Task for Reinforcement Learning with Known Optimal Solution for Discrete and Continuous Actions

Journal of Intelligent Learning Systems and Applications ◽

10.4236/jilsa.2009.11002 ◽

2009 ◽

Vol 01 (01) ◽

pp. 28-41

Author(s):

Michael C. ROTTGER ◽

Andreas W. LIEHR

Keyword(s):

Reinforcement Learning ◽

Optimal Solution ◽

Control Task ◽

Continuous Actions

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Relational Reinforcement Learning with Continuous Actions by Combining Behavioural Cloning and Locally Weighted Regression

Journal of Intelligent Learning Systems and Applications ◽

10.4236/jilsa.2010.22010 ◽

2010 ◽

Vol 02 (02) ◽

pp. 69-79 ◽

Cited By ~ 4

Author(s):

Julio H. Zaragoza ◽

Eduardo F. Morales

Keyword(s):

Reinforcement Learning ◽

Weighted Regression ◽

Locally Weighted Regression ◽

Continuous Actions

Download Full-text

I4R: Promoting Deep Reinforcement Learning by the Indicator for Expressive Representations

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/370 ◽

2020 ◽

Author(s):

Xufang Luo ◽

Qi Meng ◽

Di He ◽

Wei Chen ◽

Yunhong Wang

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Singular Values ◽

Regularization Term ◽

Policy Gradient

Learning expressive representations is always crucial for well-performed policies in deep reinforcement learning (DRL). Different from supervised learning, in DRL, accurate targets are not always available, and some inputs with different actions only have tiny differences, which stimulates the demand for learning expressive representations. In this paper, firstly, we empirically compare the representations of DRL models with different performances. We observe that the representations of a better state extractor (SE) are more scattered than a worse one when they are visualized. Thus, we investigate the singular values of representation matrix, and find that, better SEs always correspond to smaller differences among these singular values. Next, based on such observations, we define an indicator of the representations for DRL model, which is the Number of Significant Singular Values (NSSV) of a representation matrix. Then, we propose I4R algorithm, to improve DRL algorithms by adding the corresponding regularization term to enhance the NSSV. Finally, we apply I4R to both policy gradient and value based algorithms on Atari games, and the results show the superiority of our proposed method.

Download Full-text

Auto-pipeline

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476303 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2563-2575

Author(s):

Junwen Yang ◽

Yeye He ◽

Surajit Chaudhuri

Keyword(s):

Reinforcement Learning ◽

Recent Work ◽

Pipeline System ◽

Complex Data ◽

Data Preparation ◽

Significant Progress ◽

Database Table ◽

End To End

Recent work has made significant progress in helping users to automate single data preparation steps, such as string-transformations and table-manipulation operators (e.g., Join, GroupBy, Pivot, etc.). We in this work propose to automate multiple such steps end-to-end, by synthesizing complex data-pipelines with both string-transformations and table-manipulation operators. We propose a novel by-target paradigm that allows users to easily specify the desired pipeline, which is a significant departure from the traditional by-example paradigm. Using by-target, users would provide input tables (e.g., csv or json files), and point us to a "target table" (e.g., an existing database table or BI dashboard) to demonstrate how the output from the desired pipeline would schematically "look like". While the problem is seemingly under-specified, our unique insight is that implicit table constraints such as FDs and keys can be exploited to significantly constrain the space and make the problem tractable. We develop an AUTO-PIPELINE system that learns to synthesize pipelines using deep reinforcement-learning (DRL) and search. Experiments using a benchmark of 700 real pipelines crawled from GitHub and commercial vendors suggest that AUTO-PIPELINE can successfully synthesize around 70% of complex pipelines with up to 10 steps.

Download Full-text