End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

Download Full-text

Memory-based reinforcement learning algorithm for autonomous exploration in unknown environment

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418775849 ◽

2018 ◽

Vol 15 (3) ◽

pp. 172988141877584 ◽

Cited By ~ 4

Author(s):

Amir Ramezani Dooraki ◽

Deok Jin Lee

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Autonomous Exploration ◽

Unknown Environment ◽

Real Model ◽

Different Shapes ◽

The Difference ◽

Efficient Exploration ◽

Near Future

In the near future, robots would be seen in almost every area of our life, in different shapes and with different objectives such as entertainment, surveillance, rescue, and navigation. In any shape and with any objective, it is necessary for them to be capable of successful exploration. They should be able to explore efficiently and be able to adapt themselves with changes in their environment. For successful navigation, it is necessary to recognize the difference between similar places of an environment. In order to achieve this goal without increasing the capability of sensors, having a memory is crucial. In this article, an algorithm for autonomous exploration and obstacle avoidance in an unknown environment is proposed. In order to make our self-learner algorithm, a memory-based reinforcement learning method using multilayer neural network is used with the aim of creating an agent having an efficient exploration and obstacle avoidance policy. Furthermore, this agent can automatically adapt itself to the changes of its environment. Finally, in order to test the capability of our algorithm, we have implemented it in a robot similar to a real model, simulated in the robust physics engine simulator of Gazebo.

Download Full-text

Robot Manipulation Learning Using Generative Adversarial Imitation Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/678 ◽

2021 ◽

Author(s):

Mohamed Khalil Jabri

Keyword(s):

Reinforcement Learning ◽

Imitation Learning ◽

Generative Adversarial Networks ◽

Temporal Consistency ◽

Complex Environments ◽

Inverse Reinforcement Learning ◽

Learning Problem ◽

Robot Manipulation ◽

Adversarial Networks

Imitation learning allows learning complex behaviors given demonstrations. Early approaches belonging to either Behavior Cloning or Inverse Reinforcement Learning were however of limited scalability to complex environments. A more promising approach termed as Generative Adversarial Imitation Learning tackles the imitation learning problem by drawing a connection with Generative Adversarial Networks. In this work, we advocate the use of this class of methods and investigate possible extensions by endowing them with global temporal consistency, in particular through a contrastive learning based approach.

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

Temporal Uncertainty During Overshadowing

Computational Neuroscience for Advancing Artificial Intelligence ◽

10.4018/978-1-60960-021-1.ch003 ◽

2011 ◽

pp. 46-55

Author(s):

Dómhnall J. Jennings ◽

Eduardo Alonso ◽

Esther Mondragón ◽

Charlotte Bonardi

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Associative Learning ◽

Learning Theories ◽

Temporal Difference ◽

Temporal Uncertainty ◽

Learning Problem ◽

Difference Model ◽

Distribution Form ◽

Temporal Properties

Standard associative learning theories typically fail to conceptualise the temporal properties of a stimulus, and hence cannot easily make predictions about the effects such properties might have on the magnitude of conditioning phenomena. Despite this, in intuitive terms we might expect that the temporal properties of a stimulus that is paired with some outcome to be important. In particular, there is no previous research addressing the way that fixed or variable duration stimuli can affect overshadowing. In this chapter we report results which show that the degree of overshadowing depends on the distribution form - fixed or variable - of the overshadowing stimulus, and argue that conditioning is weaker under conditions of temporal uncertainty. These results are discussed in terms of models of conditioning and timing. We conclude that the temporal difference model, which has been extensively applied to the reinforcement learning problem in machine learning, accounts for the key findings of our study.

Download Full-text

Evolving Equilibrium Policies for a Multiagent Reinforcement Learning Problem with State Attractors

Computational Collective Intelligence. Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23938-0_21 ◽

2011 ◽

pp. 201-210 ◽

Cited By ~ 1

Author(s):

Florin Leon

Keyword(s):

Reinforcement Learning ◽

Learning Problem ◽

Multiagent Reinforcement Learning

Download Full-text

Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration

2018 IEEE International Conference on Robotics and Biomimetics (ROBIO) ◽

10.1109/robio.2018.8665177 ◽

2018 ◽

Cited By ~ 3

Author(s):

Tingguang Li ◽

Jin Pan ◽

Delong Zhu ◽

Max Q.-H. Meng

Keyword(s):

Reinforcement Learning ◽

Learning Framework ◽

Efficient Exploration

Download Full-text

A DEEP REINFORCEMENT LEARNING APPROACH TO FLOCKING AND NAVIGATION OF UAVS IN LARGE-SCALE COMPLEX ENVIRONMENTS

2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) ◽

10.1109/globalsip.2018.8646428 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chao Wang ◽

Jian Wang ◽

Xudong Zhang

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Learning Approach ◽

Complex Environments

Download Full-text

Rapid trajectory design in complex environments enabled by reinforcement learning and graph search strategies

Acta Astronautica ◽

10.1016/j.actaastro.2019.04.037 ◽

2020 ◽

Vol 171 ◽

pp. 172-195 ◽

Cited By ~ 1

Author(s):

A. Das-Stuart ◽

K.C. Howell ◽

D.C. Folta

Keyword(s):

Reinforcement Learning ◽

Search Strategies ◽

Graph Search ◽

Trajectory Design ◽

Complex Environments

Download Full-text

An Introduction to Intertask Transfer for Reinforcement Learning

AI Magazine ◽

10.1609/aimag.v32i1.2329 ◽

2011 ◽

Vol 32 (1) ◽

pp. 15 ◽

Cited By ~ 18

Author(s):

Matthew E. Taylor ◽

Peter Stone

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Learning Problem ◽

Open Problems ◽

Learning Framework ◽

Learning Domains ◽

Multiple Tasks ◽

Exciting Area ◽

Generalize Information ◽

Selection Of

Transfer learning has recently gained popularity due to the development of algorithms that can successfully generalize information across multiple tasks. This article focuses on transfer in the context of reinforcement learning domains, a general learning framework where an agent acts in an environment to maximize a reward signal. The goals of this article are to (1) familiarize readers with the transfer learning problem in reinforcement learning domains, (2) explain why the problem is both interesting and difﬁcult, (3) present a selection of existing techniques that demonstrate different solutions, and (4) provide representative open problems in the hope of encouraging additional research in this exciting area.

Download Full-text

HKSiamFC: Visual-Tracking Framework Using Prior Information Provided by Staple and Kalman Filter

Sensors ◽

10.3390/s20072137 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2137 ◽

Cited By ~ 2

Author(s):

Chenpu Li ◽

Qianjian Xing ◽

Zhenguo Ma

Keyword(s):

Kalman Filter ◽

Visual Tracking ◽

Prior Information ◽

State Of The Art ◽

The Other ◽

Robust Tracking ◽

Similarity Learning ◽

Complex Environments ◽

Color Information ◽

Learning Problem

In the field of visual tracking, trackers based on a convolutional neural network (CNN) have had significant achievements. The fully-convolutional Siamese (SiamFC) tracker is a typical representation of these CNN trackers and has attracted much attention. It models visual tracking as a similarity-learning problem. However, experiments showed that SiamFC was not so robust in some complex environments. This may be because the tracker lacked enough prior information about the target. Inspired by the key idea of a Staple tracker and Kalman filter, we constructed two more models to help compensate for SiamFC’s disadvantages. One model contained the target’s prior color information, and the other the target’s prior trajectory information. With these two models, we design a novel and robust tracking framework on the basis of SiamFC. We call it Histogram–Kalman SiamFC (HKSiamFC). We also evaluated HKSiamFC tracker’s performance on dataset of the online object tracking benchmark (OTB) and Temple Color (TC128), and it showed quite competitive performance when compared with the baseline tracker and several other state-of-the-art trackers.

Download Full-text