Construction of an Imitation Learning Agent Using Game Records of Unspecified Players

This article elaborates the concept of programming a robot by showing it how to do the job. This is often called “learning from demonstrations” or “imitation learning.” Labs at several institutions – for example, the Swiss Federal Institute of Technology at Lausanne, the University of Maryland, Massachusetts Institute of Technology, and Worcester Polytechnic Institute – are experimenting with technology that may one day make imitation learning common for machines. The underlying idea of this approach is to allow an agent to acquire the necessary details of how to perform a task by observing another agent (who already has the relevant expertise) perform the same task. Usually, the learning agent is a robot and the teaching agent is a human. Often, the goal of imitation learning approaches is to extract some high-level details about how to perform the task from recorded demonstrations. Research into imitation learning has achieved some impressive results ranging from training unmanned helicopters to perform complex maneuvers to teaching robots general-purpose manipulation tasks.

Download Full-text

The Education Environment for Strategy using the Imitation Learning Agent that Mimics the Behavior of Human Player

2019 IEEE 8th Global Conference on Consumer Electronics (GCCE) ◽

10.1109/gcce46687.2019.9015289 ◽

2019 ◽

Author(s):

Ueno Masayuki ◽

Wada Shinjiro ◽

Takami Tomoyuki

Keyword(s):

Imitation Learning ◽

Education Environment ◽

Learning Agent ◽

Human Player

Download Full-text

Hybrid Reinforcement Learning with Expert State Sequences

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013739 ◽

2019 ◽

Vol 33 ◽

pp. 3739-3746 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Guo ◽

Shiyu Chang ◽

Mo Yu ◽

Gerald Tesauro ◽

Murray Campbell

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Hybrid Approach ◽

Imitation Learning ◽

Learning Approaches ◽

Inference Model ◽

Learning Agent ◽

Hybrid Reinforcement ◽

Policy Optimization

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

Download Full-text

Machine Learning Combinatorial Frameworks for Architecture

International Journal of Innovation and Economic Development ◽

10.18775/ijied.1849-7551-7020.2015.72.2002 ◽

2021 ◽

Vol 7 (2) ◽

pp. 20-29

Author(s):

Joshua Lye ◽

Alisa Andrasek

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Imitation Learning ◽

Learning Models ◽

Learning Agent ◽

Policy Optimization ◽

Machine Learning Models ◽

Reinforcement Learning Algorithm

This paper investigates the application of machine learning for the simulation of larger architectural aggregations formed through the recombination of discrete components. This is primarily explored through establishing hardcoded assembly and connection logics which are used to form the framework of architectural fitness conditions for machine learning models. The key machine learning models researched are a combination of the deep reinforcement learning algorithm proximal policy optimization (PPO) and Generative Adversarial Imitation Learning (GAIL) in the Unity Machine Learning Agent asset toolkit. The goal of applying these machine learning models is to train the agent behaviours (discrete components) to learn specific logics of connection. In order to achieve assembled architectural `states that allow for spatial habitation through the process of simulation.

Download Full-text

Attention Guided Imitation Learning and Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019906 ◽

2019 ◽

Vol 33 ◽

pp. 9906-9907

Author(s):

Ruohan Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Human Action ◽

Imitation Learning ◽

Tracking Data ◽

Experimental Setting ◽

High Quality ◽

Attention Model ◽

Learning Agent ◽

Human Visual Attention

We propose a framework that uses learned human visual attention model to guide the learning process of an imitation learning or reinforcement learning agent. We have collected high-quality human action and eye-tracking data while playing Atari games in a carefully controlled experimental setting. We have shown that incorporating a learned human gaze model into deep imitation learning yields promising results.

Download Full-text

Low Level Segmentation for Imitation Learning Using the Expectation Maximization Algorithm

10.21236/ada460525 ◽

2005 ◽

Author(s):

Andrew D. Warner

Keyword(s):

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Imitation Learning ◽

Low Level

Download Full-text

Robot Imitation Learning of High-Level Planning Information

10.21236/ada460420 ◽

2005 ◽

Author(s):

Frderick L. Crabbe ◽

Rebecca Hwa

Keyword(s):

Imitation Learning ◽

High Level

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Imitation learning-based framework for learning 6-D linear compliant motions

Autonomous Robots ◽

10.1007/s10514-021-09971-y ◽

2021 ◽

Author(s):

Markku Suomalainen ◽

Fares J. Abu-dakka ◽

Ville Kyrki

Keyword(s):

Contact Force ◽

Free Space ◽

Prior Information ◽

Goal Area ◽

Imitation Learning ◽

Learning From Demonstration ◽

End Effector ◽

Localization Accuracy ◽

Novel Method ◽

Contact Tasks

AbstractWe present a novel method for learning from demonstration 6-D tasks that can be modeled as a sequence of linear motions and compliances. The focus of this paper is the learning of a single linear primitive, many of which can be sequenced to perform more complex tasks. The presented method learns from demonstrations how to take advantage of mechanical gradients in in-contact tasks, such as assembly, both for translations and rotations, without any prior information. The method assumes there exists a desired linear direction in 6-D which, if followed by the manipulator, leads the robot’s end-effector to the goal area shown in the demonstration, either in free space or by leveraging contact through compliance. First, demonstrations are gathered where the teacher explicitly shows the robot how the mechanical gradients can be used as guidance towards the goal. From the demonstrations, a set of directions is computed which would result in the observed motion at each timestep during a demonstration of a single primitive. By observing which direction is included in all these sets, we find a single desired direction which can reproduce the demonstrated motion. Finding the number of compliant axes and their directions in both rotation and translation is based on the assumption that in the presence of a desired direction of motion, all other observed motion is caused by the contact force of the environment, signalling the need for compliance. We evaluate the method on a KUKA LWR4+ robot with test setups imitating typical tasks where a human would use compliance to cope with positional uncertainty. Results show that the method can successfully learn and reproduce compliant motions by taking advantage of the geometry of the task, therefore reducing the need for localization accuracy.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text