Exploring Parameter Space in Reinforcement Learning

AbstractThis paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.

Download Full-text

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/197 ◽

2018 ◽

Cited By ~ 7

Author(s):

Patryk Chrabąszcz ◽

Ilya Loshchilov ◽

Frank Hutter

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Evolution Strategies ◽

Learning Problems ◽

Local Minima ◽

Natural Evolution ◽

The Many ◽

Made In

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades.We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.

Download Full-text

Learning agile and dynamic motor skills for legged robots

Science Robotics ◽

10.1126/scirobotics.aau5872 ◽

2019 ◽

Vol 4 (26) ◽

pp. eaau5872 ◽

Cited By ~ 93

Author(s):

Jemin Hwangbo ◽

Joonho Lee ◽

Alexey Dosovitskiy ◽

Dario Bellicoso ◽

Vassilios Tsounis ◽

...

Keyword(s):

Reinforcement Learning ◽

Motor Skills ◽

State Of The Art ◽

Control Policy ◽

Cost Effective ◽

Legged Robots ◽

Data Generation ◽

Natural Evolution ◽

Body Velocity ◽

High Level

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog–sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

Download Full-text

Boundary Extension Features for Width-Based Planning with Simulators on Continuous-State Domains

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/578 ◽

2020 ◽

Author(s):

Florent Teichteil-Königsbuch ◽

Miquel Ramirez ◽

Nir Lipovetzky

Keyword(s):

Reinforcement Learning ◽

Heuristic Search ◽

State Of The Art ◽

Black Box ◽

Boundary Extension ◽

State Variables ◽

Feature Mapping ◽

Continuous State ◽

Computational Resources ◽

Classical Control

Width-based planning algorithms have been demonstrated to be competitive with state-of-the-art heuristic search and SAT-based approaches, without requiring access to a model of action effects and preconditions, just access to a black-box simulator. Width-based planners search is guided by a measure of the novelty of states, that requires observations on simulator states to be given as a set of features. This paper proposes agnostic feature mapping mechanisms that define the features online, as exploration progresses and the domain of continuous state variables is revealed. We demonstrate the effectiveness of these features on the OpenAI gym "classical control" suite of benchmarks. We compare our online planners with state-of-the-art deep reinforcement learning algorithms, and show that width-based planners using our features can find policies of the same quality with significantly less computational resources.

Download Full-text

Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning

Sensors ◽

10.3390/s21041278 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1278

Author(s):

Jiang Hua ◽

Liangcai Zeng ◽

Gongfa Li ◽

Zhaojie Ju

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Robot Control ◽

Learning Transfer ◽

State Of The Art ◽

Imitation Learning ◽

Future Research ◽

Intelligent Robot ◽

Dexterous Manipulation ◽

Research Challenges

Dexterous manipulation of the robot is an important part of realizing intelligence, but manipulators can only perform simple tasks such as sorting and packing in a structured environment. In view of the existing problem, this paper presents a state-of-the-art survey on an intelligent robot with the capability of autonomous deciding and learning. The paper first reviews the main achievements and research of the robot, which were mainly based on the breakthrough of automatic control and hardware in mechanics. With the evolution of artificial intelligence, many pieces of research have made further progresses in adaptive and robust control. The survey reveals that the latest research in deep learning and reinforcement learning has paved the way for highly complex tasks to be performed by robots. Furthermore, deep reinforcement learning, imitation learning, and transfer learning in robot control are discussed in detail. Finally, major achievements based on these methods are summarized and analyzed thoroughly, and future research challenges are proposed.

Download Full-text

Evaluation of a "Black-Box" State-of-the-Art Vision-Based Navigation Algorithm for GPS-Denied Navigation

Proceedings of the 33rd International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2020) ◽

10.33012/2020.17765 ◽

2020 ◽

Author(s):

Simone B. Bortolami ◽

Helen Webb ◽

Michael Richman ◽

Peter Norton

Keyword(s):

State Of The Art ◽

Black Box ◽

Navigation Algorithm ◽

Vision Based Navigation ◽

Gps Denied Navigation

Download Full-text

Review of the Applications of Deep Learning in Bioinformatics

Current Bioinformatics ◽

10.2174/1574893615999200711165743 ◽

2021 ◽

Vol 15 (8) ◽

pp. 898-911

Author(s):

Yongqing Zhang ◽

Jianrong Yan ◽

Siyu Chen ◽

Meiqin Gong ◽

Dongrui Gao ◽

...

Keyword(s):

Deep Learning ◽

Drug Discovery ◽

Biomedical Imaging ◽

State Of The Art ◽

Black Box ◽

Medical Data ◽

Biological Data ◽

High Dimensional ◽

Biological Research ◽

Process Data

Rapid advances in biological research over recent years have significantly enriched biological and medical data resources. Deep learning-based techniques have been successfully utilized to process data in this field, and they have exhibited state-of-the-art performances even on high-dimensional, nonstructural, and black-box biological data. The aim of the current study is to provide an overview of the deep learning-based techniques used in biology and medicine and their state-of-the-art applications. In particular, we introduce the fundamentals of deep learning and then review the success of applying such methods to bioinformatics, biomedical imaging, biomedicine, and drug discovery. We also discuss the challenges and limitations of this field, and outline possible directions for further research.

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Drone Deep Reinforcement Learning: A Review

Electronics ◽

10.3390/electronics10090999 ◽

2021 ◽

Vol 10 (9) ◽

pp. 999

Author(s):

Ahmad Taher Azar ◽

Anis Koubaa ◽

Nada Ali Mohamed ◽

Habiba A. Ibrahim ◽

Zahra Fathy Ibrahim ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Real Life ◽

Environment Monitoring ◽

Simulated Environments ◽

Infrastructure Inspection ◽

Remote Sensing Mapping ◽

And Control ◽

The Military ◽

Uav Navigation

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.

Download Full-text