Exploring Fault Parameter Space Using Reinforcement Learning-based Fault Injection

Efficient exploration is a major challenge in Reinforcement Learning (RL) and has been studied extensively. However, for a new task existing methods explore either by taking actions that maximize task agnostic objectives (such as information gain) or applying a simple dithering strategy (such as noise injection), which might not be effective enough. In this paper, we investigate whether previous learning experiences can be leveraged to guide exploration of current new task. To this end, we propose a novel Exploration with Structured Noise in Parameter Space (ESNPS) approach. ESNPS utilizes meta-learning and directly uses meta-policy parameters, which contain prior knowledge, as structured noises to perturb the base model for effective exploration in new tasks. Experimental results on four groups of tasks: cheetah velocity, cheetah direction, ant velocity and ant direction demonstrate the superiority of ESNPS against a number of competitive baselines.

Download Full-text

Robot Skill Learning: From Reinforcement Learning to Evolution Strategies

Paladyn Journal of Behavioral Robotics ◽

10.2478/pjbr-2013-0003 ◽

2013 ◽

Vol 4 (1) ◽

Cited By ~ 48

Author(s):

Freek Stulp ◽

Olivier Sigaud

Keyword(s):

Reinforcement Learning ◽

Utility Function ◽

Parameter Space ◽

Learning Algorithms ◽

Skill Learning ◽

Action Space ◽

Weighted Averaging ◽

Gradient Estimation ◽

Policy Improvement ◽

Current Trends

AbstractPolicy improvement methods seek to optimize the parameters of a policy with respect to a utility function. Owing to current trends involving searching in parameter space (rather than action space) and using reward-weighted averaging (rather than gradient estimation), reinforcement learning algorithms for policy improvement, e.g. PoWER and PI

Download Full-text

Exploring Parameter Space in Reinforcement Learning

Paladyn Journal of Behavioral Robotics ◽

10.2478/s13230-010-0002-4 ◽

2010 ◽

Vol 1 (1) ◽

Cited By ~ 18

Author(s):

Thomas Rückstieß ◽

Frank Sehnke ◽

Tom Schaul ◽

Daan Wierstra ◽

Yi Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Parameter Space ◽

Robot Control ◽

State Of The Art ◽

Black Box ◽

General Function ◽

Natural Evolution ◽

State Dependent ◽

Learning Parameter ◽

Function Approximator

AbstractThis paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.

Download Full-text

Structural Parameter Space Exploration for Reinforcement Learning via a Matrix Variate Distribution

IEEE Transactions on Emerging Topics in Computational Intelligence ◽

10.1109/tetci.2022.3140380 ◽

2022 ◽

pp. 1-11

Author(s):

Shaochen Wang ◽

Rui Yang ◽

Bin Li ◽

Zhen Kan

Keyword(s):

Reinforcement Learning ◽

Parameter Space ◽

Space Exploration ◽

Structural Parameter

Download Full-text

On Global Optimization of Walking Gaits for the Compliant Humanoid Robot, COMAN Using Reinforcement Learning

Cybernetics and Information Technologies ◽

10.2478/cait-2012-0020 ◽

2012 ◽

Vol 12 (3) ◽

pp. 39-52 ◽

Cited By ~ 6

Author(s):

Houman Dallali ◽

Petar Kormushev ◽

Zhibin Li ◽

Darwin Caldwell

Keyword(s):

Global Optimization ◽

Reinforcement Learning ◽

Dynamic Model ◽

Parameter Space ◽

Degrees Of Freedom ◽

Humanoid Robot ◽

Learning Algorithm ◽

Global Search ◽

Optimization Criteria ◽

Gait Parameters

Abstract In ZMP trajectory generation using simple models, often a considerable amount of trials and errors are involved to obtain locally stable gaits by manually tuning the gait parameters. In this paper a 15 degrees of Freedom dynamic model of a compliant humanoid robot is used, combined with reinforcement learning to perform global search in the parameter space to produce stable gaits. It is shown that for a given speed, multiple sets of parameters, namely step sizes and lateral sways, are obtained by the learning algorithm which can lead to stable walking. The resulting set of gaits can be further studied in terms of parameter sensitivity and also to include additional optimization criteria to narrow down the chosen walking trajectories for the humanoid robot.

Download Full-text