Learning agile and dynamic motor skills for legged robots

Legged robots pose one of the greatest challenges in robotics. Dynamic and agile maneuvers of animals cannot be imitated by existing methods that are crafted by humans. A compelling alternative is reinforcement learning, which requires minimal craftsmanship and promotes the natural evolution of a control policy. However, so far, reinforcement learning research for legged robots is mainly limited to simulation, and only few and comparably simple examples have been deployed on real systems. The primary reason is that training with real robots, particularly with dynamically balancing systems, is complicated and expensive. In the present work, we introduce a method for training a neural network policy in simulation and transferring it to a state-of-the-art legged system, thereby leveraging fast, automated, and cost-effective data generation schemes. The approach is applied to the ANYmal robot, a sophisticated medium-dog–sized quadrupedal system. Using policies trained in simulation, the quadrupedal machine achieves locomotion skills that go beyond what had been achieved with prior methods: ANYmal is capable of precisely and energy-efficiently following high-level body velocity commands, running faster than before, and recovering from falling even in complex configurations.

Download Full-text

Towards High-Level Intrinsic Exploration in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/733 ◽

2020 ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

State Of The Art ◽

Experimental Results ◽

Prior Work ◽

Extrinsic Rewards ◽

Intrinsic Reward ◽

Long Time ◽

End To End ◽

High Level

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.

Download Full-text

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/197 ◽

2018 ◽

Cited By ~ 7

Author(s):

Patryk Chrabąszcz ◽

Ilya Loshchilov ◽

Frank Hutter

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Evolution Strategies ◽

Learning Problems ◽

Local Minima ◽

Natural Evolution ◽

The Many ◽

Made In

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades.We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.

Download Full-text

Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Robotics ◽

10.3390/robotics9010008 ◽

2020 ◽

Vol 9 (1) ◽

pp. 8 ◽

Cited By ~ 2

Author(s):

Riccardo Polvara ◽

Massimiliano Patacchiola ◽

Marc Hanheide ◽

Gerhard Neumann

Keyword(s):

State Of The Art ◽

Control Policy ◽

Divide And Conquer ◽

Level Control ◽

Autonomous Landing ◽

Aerial Vehicle ◽

Noisy Conditions ◽

High Level ◽

Technical Solutions ◽

First Time

The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions.

Download Full-text

The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement Learning

Symmetry ◽

10.3390/sym13081335 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1335

Author(s):

Menghao Wu ◽

Yanbin Gao ◽

Pengfei Wang ◽

Fan Zhang ◽

Zhejun Liu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Control Policy ◽

Continuous Action ◽

Control Approach ◽

Low Level ◽

Learning Technique ◽

Distance Sensor ◽

High Level ◽

Action Spaces

In robotics, obstacle avoidance is an essential ability for distance sensor-based robots. This type of robot has axisymmetrically distributed distance sensors to acquire obstacle distance, so the state is symmetrical. Training the control policy with a reinforcement learning method is a trend. Considering the complexity of environments, such as narrow paths and right-angle turns, robots will have a better ability if the control policy can control the steering direction and speed simultaneously. This paper proposes the multi-dimensional action control (MDAC) approach based on a reinforcement learning technique, which can be used in multiple continuous action space tasks. It adopts a hierarchical structure, which has high and low-level modules. Low-level policies output concrete actions and the high-level policy determines when to invoke low-level modules according to the environment’s features. We design robot navigation experiments with continuous action spaces to test the method’s performance. It is an end-to-end approach and can solve complex obstacle avoidance tasks in navigation.

Download Full-text

Reinforcement learning control of a biomechanical model of the upper extremity

Scientific Reports ◽

10.1038/s41598-021-93760-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Florian Fischer ◽

Miroslav Bachinski ◽

Markus Klar ◽

Arthur Fleig ◽

Jörg Müller

Keyword(s):

Reinforcement Learning ◽

Upper Extremity ◽

Degrees Of Freedom ◽

State Of The Art ◽

Movement Time ◽

Biomechanical Model ◽

Control Policy ◽

Human Movement ◽

Motor Noise ◽

Skeletal Model

AbstractAmong the infinite number of possible movements that can be produced, humans are commonly assumed to choose those that optimize criteria such as minimizing movement time, subject to certain movement constraints like signal-dependent and constant motor noise. While so far these assumptions have only been evaluated for simplified point-mass or planar models, we address the question of whether they can predict reaching movements in a full skeletal model of the human upper extremity. We learn a control policy using a motor babbling approach as implemented in reinforcement learning, using aimed movements of the tip of the right index finger towards randomly placed 3D targets of varying size. We use a state-of-the-art biomechanical model, which includes seven actuated degrees of freedom. To deal with the curse of dimensionality, we use a simplified second-order muscle model, acting at each degree of freedom instead of individual muscles. The results confirm that the assumptions of signal-dependent and constant motor noise, together with the objective of movement time minimization, are sufficient for a state-of-the-art skeletal model of the human upper extremity to reproduce complex phenomena of human movement, in particular Fitts’ Law and the $$\frac{2}{3}$$ 2 3 Power Law. This result supports the notion that control of the complex human biomechanical system can plausibly be determined by a set of simple assumptions and can easily be learned.

Download Full-text

Robustly Learning Composable Options in Deep Reinforcement Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/298 ◽

2021 ◽

Author(s):

Akhil Bagaria ◽

Jason Senthil ◽

Matthew Slivinski ◽

George Konidaris

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Hierarchical Reinforcement Learning ◽

Model Based ◽

Maze Navigation ◽

Horizon Problems ◽

High Level ◽

Discovery Method

Hierarchical reinforcement learning (HRL) is only effective for long-horizon problems when high-level skills can be reliably sequentially executed. Unfortunately, learning reliably composable skills is difficult, because all the components of every skill are constantly changing during learning. We propose three methods for improving the composability of learned skills: representing skill initiation regions using a combination of pessimistic and optimistic classifiers; learning re-targetable policies that are robust to non-stationary subgoal regions; and learning robust option policies using model-based RL. We test these improvements on four sparse-reward maze navigation tasks involving a simulated quadrupedal robot. Each method successively improves the robustness of a baseline skill discovery method, substantially outperforming state-of-the-art flat and hierarchical methods.

Download Full-text

Exploring Parameter Space in Reinforcement Learning

Paladyn Journal of Behavioral Robotics ◽

10.2478/s13230-010-0002-4 ◽

2010 ◽

Vol 1 (1) ◽

Cited By ~ 18

Author(s):

Thomas Rückstieß ◽

Frank Sehnke ◽

Tom Schaul ◽

Daan Wierstra ◽

Yi Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Parameter Space ◽

Robot Control ◽

State Of The Art ◽

Black Box ◽

General Function ◽

Natural Evolution ◽

State Dependent ◽

Learning Parameter ◽

Function Approximator

AbstractThis paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.

Download Full-text

Policy based reinforcement learning approach Of Jobshop scheduling with high level deadlock detection

10.31274/etd-180810-1488 ◽

2014 ◽

Author(s):

Mengmeng Chen

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Deadlock Detection ◽

Jobshop Scheduling ◽

High Level

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Applied Sciences ◽

10.3390/app11031291 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1291

Author(s):

Bonwoo Gu ◽

Yunsick Sung

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Classification Criteria ◽

Tree Search ◽

Learning Method ◽

Board Game ◽

Ancient China ◽

Monte Carlo Tree Search ◽

High Level ◽

Tree Search Algorithm

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Download Full-text