BOLeRo: Behavior optimization and learning for robots

Reinforcement learning and behavior optimization are becoming more and more popular in the field of robotics because algorithms are mature enough to tackle real problems in this domain. Robust implementations of state-of-the-art algorithms are often not publicly available though, and experiments are hardly reproducible because open-source implementations are often not available or are still in a stage of research code. Consequently, often it is infeasible to deploy these algorithms on robotic systems. BOLeRo closes this gap for policy search and evolutionary algorithms by delivering open-source implementations of behavior learning algorithms for robots. It is easy to integrate in robotic middlewares and it can be used to compare methods and develop prototypes in simulation.

Download Full-text

Universal Reinforcement Learning Algorithms: Survey and Experiments

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/194 ◽

2017 ◽

Author(s):

John Aslanides ◽

Jan Leike ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Markov Decision Process ◽

Decision Process ◽

Empirical Investigation ◽

State Of The Art ◽

Learning Algorithms ◽

Markov Decision ◽

Reference Implementation ◽

Partially Observable

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

Open Source Robotic Simulators Platforms for Teaching Deep Reinforcement Learning Algorithms

Procedia Computer Science ◽

10.1016/j.procs.2019.02.031 ◽

2019 ◽

Vol 150 ◽

pp. 162-170 ◽

Cited By ~ 1

Author(s):

Armando Plasencia ◽

Yulia Shichkina ◽

Ileana Suárez ◽

Zoila Ruiz

Keyword(s):

Reinforcement Learning ◽

Open Source ◽

Learning Algorithms

Download Full-text

Nonconvex Policy Search Using Variational Inequalities

Neural Computation ◽

10.1162/neco_a_01004 ◽

2017 ◽

Vol 29 (10) ◽

pp. 2800-2824

Author(s):

Yusen Zhan ◽

Haitham Bou Ammar ◽

Matthew E. Taylor

Keyword(s):

Reinforcement Learning ◽

Dynamical Systems ◽

Variational Inequalities ◽

Learning Algorithms ◽

High Dimensional ◽

Control Problems ◽

Policy Search ◽

Optimal Policies ◽

Stochastic Setting ◽

First Time

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies. These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

Download Full-text

Interactive Reinforcement Learning with Dynamic Reuse of Prior Knowledge from Human and Agent Demonstrations

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/530 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhaodong Wang ◽

Matthew E. Taylor

Keyword(s):

Reinforcement Learning ◽

Performance Analysis ◽

Prior Knowledge ◽

High Performance ◽

State Of The Art ◽

Learning Algorithms ◽

Learning Performance ◽

Superior Performance ◽

Confidence Measure ◽

Multiple State

Reinforcement learning has enjoyed multiple impressive successes in recent years. However, these successes typically require very large amounts of data before an agent achieves acceptable performance. This paper focuses on a novel way of combating such requirements by leveraging existing (human or agent) knowledge. In particular, this paper leverages demonstrations, allowing an agent to quickly achieve high performance. This paper introduces the Dynamic Reuse of Prior (DRoP) algorithm, which combines the offline knowledge (demonstrations recorded before learning) with online confidence-based performance analysis. DRoP leverages the demonstrator's knowledge by automatically balancing between reusing the prior knowledge and the current learned policy, allowing the agent to outperform the original demonstrations. We compare with multiple state-of-the-art learning algorithms and empirically show that DRoP can achieve superior performance in two domains. Additionally, we show that this confidence measure can be used to selectively request additional demonstrations, significantly improving the learning performance of the agent.

Download Full-text

Deep Reinforcement Learning for Soft Robotic Applications: Brief Overview with Impending Challenges

10.20944/preprints201811.0510.v2 ◽

2018 ◽

Author(s):

Sarthak Bhagat ◽

Hritwick Banerjee ◽

Hongliang Ren

Keyword(s):

Reinforcement Learning ◽

Degrees Of Freedom ◽

State Of The Art ◽

Imitation Learning ◽

Robotic Systems ◽

Future Research ◽

Learning Techniques ◽

Increasing Trend ◽

Embodied Intelligence ◽

Robotic Applications

The increasing trend of studying the innate softness of robotic structures and amalgamating it with the benefits of the extensive developments in the field of embodied intelligence has led to sprouting of a relatively new yet extremely rewarding sphere of technology. The fusion of current deep reinforcement algorithms with physical advantages of a soft bio-inspired structure certainly directs us to a fruitful prospect of designing completely self-sufficient agents that are capable of learning from observations collected from their environment to achieve a task they have been assigned. For soft robotics structure possessing countless degrees of freedom, it is often not easy (something not even possible) to formulate mathematical constraints necessary for training a deep reinforcement learning (DRL) agent for the task in hand, hence, we resolve to imitation learning techniques due to ease of manually performing such tasks like manipulation that could be comfortably mimicked by our agent. Deploying current imitation learning algorithms on soft robotic systems have been observed to provide satisfactory results but there are still challenges in doing so. This review article thus posits an overview of various such algorithms along with instances of them being applied to real world scenarios and yielding state-of-the-art results followed by brief descriptions on various pristine branches of DRL research that may be centers of future research in this field of interest.

Download Full-text

Reinforced Mnemonic Reader for Machine Reading Comprehension

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/570 ◽

2018 ◽

Cited By ~ 25

Author(s):

Minghao Hu ◽

Yuxing Peng ◽

Zhen Huang ◽

Xipeng Qiu ◽

Furu Wei ◽

...

Keyword(s):

Reading Comprehension ◽

Reinforcement Learning ◽

Question Answering ◽

State Of The Art ◽

Learning Algorithms ◽

Optimization Approach ◽

Exact Match ◽

Machine Reading

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

Download Full-text

Learning and generalising object extraction skill for contact-rich disassembly tasks: an introductory study

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-021-08086-z ◽

2021 ◽

Author(s):

Antonio Serrano-Muñoz ◽

Nestor Arana-Arexolaleiba ◽

Dimitrios Chrysostomou ◽

Simon Bøgh

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Robotic Manipulation ◽

Object Extraction ◽

Learning Agents ◽

Machine Learning Methods ◽

Key Concepts ◽

Planning And Operation

AbstractRemanufacturing automation must be designed to be flexible and robust enough to overcome the uncertainties, conditions of the products, and complexities in the planning and operation of the processes. Machine learning methods, in particular reinforcement learning, are presented as techniques to learn, improve, and generalise the automation of many robotic manipulation tasks (most of them related to grasping, picking, or assembly). However, not much has been exploited in remanufacturing, in particular in disassembly tasks. This work presents the state of the art of contact-rich disassembly using reinforcement learning algorithms and a study about the generalisation of object extraction skills when applied to contact-rich disassembly tasks. The generalisation capabilities of two state-of-the-art reinforcement learning agents (trained in simulation) are tested and evaluated in simulation, and real world while perform a disassembly task. Results show that at least one of the agents can generalise the contact-rich extraction skill. Besides, this work identifies key concepts and gaps for the reinforcement learning algorithms’ research and application on disassembly tasks.

Download Full-text

State-of-the-Art Reinforcement Learning Algorithms

International Journal of Engineering Research and ◽

10.17577/ijertv8is120332 ◽

2020 ◽

Vol V8 (12) ◽

Author(s):

Deepanshu Mehta ◽

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms

Download Full-text