Memory-based reinforcement learning algorithm for autonomous exploration in unknown environment

In the near future, robots would be seen in almost every area of our life, in different shapes and with different objectives such as entertainment, surveillance, rescue, and navigation. In any shape and with any objective, it is necessary for them to be capable of successful exploration. They should be able to explore efficiently and be able to adapt themselves with changes in their environment. For successful navigation, it is necessary to recognize the difference between similar places of an environment. In order to achieve this goal without increasing the capability of sensors, having a memory is crucial. In this article, an algorithm for autonomous exploration and obstacle avoidance in an unknown environment is proposed. In order to make our self-learner algorithm, a memory-based reinforcement learning method using multilayer neural network is used with the aim of creating an agent having an efficient exploration and obstacle avoidance policy. Furthermore, this agent can automatically adapt itself to the changes of its environment. Finally, in order to test the capability of our algorithm, we have implemented it in a robot similar to a real model, simulated in the robust physics engine simulator of Gazebo.

Download Full-text

Robot obstacle avoidance system using deep reinforcement learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-06-2021-0127 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiaojun Zhu ◽

Yinghao Liang ◽

Hanxu Sun ◽

Xueqian Wang ◽

Bin Ren

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Optimal Path ◽

Environmental Parameters ◽

Working Environment ◽

Content Type ◽

Practical Applications ◽

Human Operators

Purpose Most manufacturing plants choose the easy way of completely separating human operators from robots to prevent accidents, but as a result, it dramatically affects the overall quality and speed that is expected from human–robot collaboration. It is not an easy task to ensure human safety when he/she has entered a robot’s workspace, and the unstructured nature of those working environments makes it even harder. The purpose of this paper is to propose a real-time robot collision avoidance method to alleviate this problem. Design/methodology/approach In this paper, a model is trained to learn the direct control commands from the raw depth images through self-supervised reinforcement learning algorithm. To reduce the effect of sample inefficiency and safety during initial training, a virtual reality platform is used to simulate a natural working environment and generate obstacle avoidance data for training. To ensure a smooth transfer to a real robot, the automatic domain randomization technique is used to generate randomly distributed environmental parameters through the obstacle avoidance simulation of virtual robots in the virtual environment, contributing to better performance in the natural environment. Findings The method has been tested in both simulations with a real UR3 robot for several practical applications. The results of this paper indicate that the proposed approach can effectively make the robot safety-aware and learn how to divert its trajectory to avoid accidents with humans within the workspace. Research limitations/implications The method has been tested in both simulations with a real UR3 robot in several practical applications. The results indicate that the proposed approach can effectively make the robot be aware of safety and learn how to change its trajectory to avoid accidents with persons within the workspace. Originality/value This paper provides a novel collision avoidance framework that allows robots to work alongside human operators in unstructured and complex environments. The method uses end-to-end policy training to directly extract the optimal path from the visual inputs for the scene.

Download Full-text

End-to-End Autonomous Exploration with Deep Reinforcement Learning and Intrinsic Motivation

Computational Intelligence and Neuroscience ◽

10.1155/2021/9945044 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Xiaogang Ruan ◽

Peng Li ◽

Xiaoqing Zhu ◽

Hejie Yu ◽

Naigong Yu

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Driving Forces ◽

Temporal Distance ◽

Training Methods ◽

Complex Environments ◽

Learning Problem ◽

Autonomous Exploration ◽

Exploration Behavior ◽

Efficient Exploration

Developing artificial intelligence (AI) agents is challenging for efficient exploration in visually rich and complex environments. In this study, we formulate the exploration question as a reinforcement learning problem and rely on intrinsic motivation to guide exploration behavior. Such intrinsic motivation is driven by curiosity and is calculated based on episode memory. To distribute the intrinsic motivation, we use a count-based method and temporal distance to generate it synchronously. We tested our approach in 3D maze-like environments and validated its performance in exploration tasks through extensive experiments. The experimental results show that our agent can learn exploration ability from raw sensory input and accomplish autonomous exploration across different mazes. In addition, the learned policy is not biased by stochastic objects. We also analyze the effects of different training methods and driving forces on exploration policy.

Download Full-text

A data-based online reinforcement learning algorithm with high-efficient exploration

2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2014.7010631 ◽

2014 ◽

Cited By ~ 1

Author(s):

Yuanheng Zhu ◽

Dongbin Zhao

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

High Efficient ◽

Efficient Exploration ◽

Reinforcement Learning Algorithm

Download Full-text

Multi-objective reinforcement learning algorithm for MOSDMP in unknown environment

2010 8th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2010.5553980 ◽

2010 ◽

Author(s):

Yun Zhao ◽

Qingwei Chen ◽

Weili Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Unknown Environment ◽

Multi Objective ◽

Reinforcement Learning Algorithm

Download Full-text

An Empirical Investigation of Transfer Effects for Reinforcement Learning

Computational Intelligence and Neuroscience ◽

10.1155/2020/8873057 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Jung-Sing Jwo ◽

Ching-Sheng Lin ◽

Cheng-Hsiung Lee ◽

Ya-Ching Lo

Keyword(s):

Reinforcement Learning ◽

Empirical Investigation ◽

Learning Algorithm ◽

Q Learning ◽

Sorting Problem ◽

Long Time ◽

The Difference ◽

Reinforcement Model ◽

The Brain

Previous studies have shown that training a reinforcement model for the sorting problem takes very long time, even for small sets of data. To study whether transfer learning could improve the training process of reinforcement learning, we employ Q-learning as the base of the reinforcement learning algorithm, apply the sorting problem as a case study, and assess the performance from two aspects, the time expense and the brain capacity. We compare the total number of training steps between nontransfer and transfer methods to study the efficiencies and evaluate their differences in brain capacity (i.e., the percentage of the updated Q-values in the Q-table). According to our experimental results, the difference in the total number of training steps will become smaller when the size of the numbers to be sorted increases. Our results also show that the brain capacities of transfer and nontransfer reinforcement learning will be similar when they both reach a similar training level.

Download Full-text

Obstacle Avoidance Path Planning for Mobile Robot Based on Ant-Q Reinforcement Learning Algorithm

Advances in Neural Networks – ISNN 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72383-7_83 ◽

2007 ◽

pp. 704-713 ◽

Cited By ~ 7

Author(s):

Ngo Anh Vien ◽

Nguyen Hoang Viet ◽

SeungGwan Lee ◽

TaeChoong Chung

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Mobile Robot ◽

Obstacle Avoidance ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Robust Reinforcement Learning

Neural Computation ◽

10.1162/0899766053011528 ◽

2005 ◽

Vol 17 (2) ◽

pp. 335-359 ◽

Cited By ~ 45

Author(s):

Jun Morimoto ◽

Kenji Doya

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Action Planning ◽

Control Agent ◽

Control Input ◽

Environmental Models ◽

Online Learning Algorithms ◽

The Difference ◽

The Value Function

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H∞ control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.

Download Full-text

Research on Motion Planning Based on Flocking Control and Reinforcement Learning for Multi-Robot Systems

Machines ◽

10.3390/machines9040077 ◽

2021 ◽

Vol 9 (4) ◽

pp. 77

Author(s):

Minghui Wang ◽

Bi Zeng ◽

Qiujie Wang

Keyword(s):

Reinforcement Learning ◽

Motion Planning ◽

Obstacle Avoidance ◽

Formation Control ◽

Learning Strategy ◽

Planning Method ◽

Unknown Environment ◽

Robot Systems ◽

Avoidance Control ◽

Multi Robot

Robots have poor adaptive ability in terms of formation control and obstacle avoidance control in unknown complex environments. To address this problem, in this paper, we propose a new motion planning method based on flocking control and reinforcement learning. It uses flocking control to implement a multi-robot orderly motion. To avoid the trap of potential fields faced during flocking control, the flocking control is optimized, and the strategy of wall-following behavior control is designed. In this paper, reinforcement learning is adopted to implement the robotic behavioral decision and to enhance the analytical and predictive abilities of the robot during motion planning in an unknown environment. A visual simulation platform is developed in this paper, on which researchers can test algorithms for multi-robot motion control, such as obstacle avoidance control, formation control, path planning and reinforcement learning strategy. As shown by the simulation experiments, the motion planning method presented in this paper can enhance the abilities of multi-robot systems to self-learn and self-adapt under a fully unknown environment with complex obstacles.

Download Full-text