Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot

Joris De Winter; Albert De Beir; Ilias El Makrini; Greet Van de Perre; Ann Nowé; Bram Vanderborght

doi:10.3390/robotics8040104

Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot

Robotics ◽

10.3390/robotics8040104 ◽

2019 ◽

Vol 8 (4) ◽

pp. 104

Author(s):

Joris De Winter ◽

Albert De Beir ◽

Ilias El Makrini ◽

Greet Van de Perre ◽

Ann Nowé ◽

...

Keyword(s):

Reinforcement Learning ◽

Solution Space ◽

Assembly Sequence ◽

Assembly Task ◽

Human Interactions ◽

Simulated Environment ◽

Learning Speed ◽

Reward Shaping ◽

Feedback Strategies ◽

Changing Knowledge

The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.

Download Full-text

Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

The Knowledge Engineering Review ◽

10.1017/s0269888918000292 ◽

2018 ◽

Vol 33 ◽

Cited By ~ 4

Author(s):

Patrick Mannion ◽

Sam Devlin ◽

Jim Duggan ◽

Enda Howley

Keyword(s):

Reinforcement Learning ◽

Credit Assignment ◽

Multi Objective ◽

Knowledge Based ◽

Learning Speed ◽

Reward Shaping ◽

Multi Agent ◽

Single Objective ◽

Real World Problems

AbstractThe majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.

Download Full-text

GateRL: Automated Circuit Design Framework of CMOS Logic Gates Using Reinforcement Learning

Electronics ◽

10.3390/electronics10091032 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1032

Author(s):

Hyoungsik Nam ◽

Young In Kim ◽

Jina Bae ◽

Junhee Lee

Keyword(s):

Reinforcement Learning ◽

Circuit Design ◽

Logic Gates ◽

Connection Matrix ◽

Design Framework ◽

Network Layers ◽

Learning Speed ◽

Fully Connected ◽

Automated Circuit Design ◽

Circuit Configuration

This paper proposes a GateRL that is an automated circuit design framework of CMOS logic gates based on reinforcement learning. Because there are constraints in the connection of circuit elements, the action masking scheme is employed. It also reduces the size of the action space leading to the improvement on the learning speed. The GateRL consists of an agent for the action and an environment for state, mask, and reward. State and reward are generated from a connection matrix that describes the current circuit configuration, and the mask is obtained from a masking matrix based on constraints and current connection matrix. The action is given rise to by the deep Q-network of 4 fully connected network layers in the agent. In particular, separate replay buffers are devised for success transitions and failure transitions to expedite the training process. The proposed network is trained with 2 inputs, 1 output, 2 NMOS transistors, and 2 PMOS transistors to design all the target logic gates, such as buffer, inverter, AND, OR, NAND, and NOR. Consequently, the GateRL outputs one-transistor buffer, two-transistor inverter, two-transistor AND, two-transistor OR, three-transistor NAND, and three-transistor NOR. The operations of these resultant logics are verified by the SPICE simulation.

Download Full-text

A Reinforcement-Learning-Based Evolutionary Algorithm Using Solution Space Clustering For Multimodal Optimization Problems

2021 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec45853.2021.9504896 ◽

2021 ◽

Author(s):

Hai Xia ◽

Changhe Li ◽

Sanyou Zeng ◽

Qingshan Tan ◽

Junchen Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Evolutionary Algorithm ◽

Optimization Problems ◽

Solution Space ◽

Multimodal Optimization

Download Full-text

Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks

Applied Sciences ◽

10.3390/app9030502 ◽

2019 ◽

Vol 9 (3) ◽

pp. 502 ◽

Cited By ~ 8

Author(s):

Cristyan Gil ◽

Hiram Calvo ◽

Humberto Sossa

Keyword(s):

Reinforcement Learning ◽

Gait Cycle ◽

Biped Robot ◽

Q Learning ◽

Nao Robot ◽

Simulated Environment ◽

Proposed Model ◽

Normal Speed ◽

Multi Level ◽

Efficient Gait

Programming robots for performing different activities requires calculating sequences of values of their joints by taking into account many factors, such as stability and efficiency, at the same time. Particularly for walking, state of the art techniques to approximate these sequences are based on reinforcement learning (RL). In this work we propose a multi-level system, where the same RL method is used first to learn the configuration of robot joints (poses) that allow it to stand with stability, and then in the second level, we find the sequence of poses that let it reach the furthest distance in the shortest time, while avoiding falling down and keeping a straight path. In order to evaluate this, we focus on measuring the time it takes for the robot to travel a certain distance. To our knowledge, this is the first work focusing both on speed and precision of the trajectory at the same time. We implement our model in a simulated environment using q-learning. We compare with the built-in walking modes of an NAO robot by improving normal-speed and enhancing robustness in fast-speed. The proposed model can be extended to other tasks and is independent of a particular robot model.

Download Full-text

Optimizing assembly sequence planning using precedence graph-based assembly subsets prediction method

Assembly Automation ◽

10.1108/aa-02-2019-0031 ◽

2019 ◽

Vol 40 (2) ◽

pp. 361-375 ◽

Cited By ~ 1

Author(s):

Nan Zhang ◽

Zhenyu Liu ◽

Chan Qiu ◽

Weifei Hu ◽

Jianrong Tan

Keyword(s):

High Efficiency ◽

Prediction Method ◽

Solution Space ◽

Assembly Sequence Planning ◽

Selection Strategy ◽

Assembly Process ◽

Assembly Sequence ◽

Content Type ◽

Sequence Planning ◽

Precedence Graph

Purpose Assembly sequence planning (ASP) plays a vital role in assembly process because it directly influences the feasibility, cost and time of the assembly process. The purpose of this study is to solve ASP problem more efficiently than current algorithms. Design/methodology/approach A novel assembly subsets prediction method based on precedence graph is proposed to solve the ASP problem. The proposed method adopts the idea of local to whole and integrates a simplified firework algorithm. First, assembly subsets are generated as initial fireworks. Then, each firework explodes to several sparks with higher-level assembly subsets and new fireworks are selected for next generation according to selection strategy. Finally, iterating the algorithm until complete and feasible solutions are generated. Findings The proposed method performs better in comparison with state-of-the-art algorithms because of the balance of exploration (fireworks) and exploitation (sparks). The size of initial fireworks population determines the diversity of the solution, so assembly subsets prediction method based on precedence graph (ASPM-PG) can explore the solution space. The size of sparks controls the exploitation ability of ASPM-PG; with more sparks, the direction of a specific firework can be adequately exploited. Practical implications The proposed method is with simple structure and high efficiency. It is anticipated that using the proposed method can effectively improve the efficiency of ASP and reduce computing cost for industrial applications. Originality/value The proposed method finds the optimal sequence in the construction process of assembly sequence rather than adjusting order of a complete assembly sequence in traditional methods. Moreover, a simplified firework algorithm with new operators is introduced. Two basic size parameters are also analyzed to explain the proposed method.

Download Full-text

Continuous Reinforcement Learning With Knowledge-Inspired Reward Shaping for Autonomous Cavity Filter Tuning

2018 IEEE International Conference on Cyborg and Bionic Systems (CBS) ◽

10.1109/cbs.2018.8612197 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zhiyang Wang ◽

Yongsheng Ou ◽

Xinyu Wu ◽

Wei Feng

Keyword(s):

Reinforcement Learning ◽

Continuous Reinforcement ◽

Filter Tuning ◽

Reward Shaping

Download Full-text

Acceleration of Reinforcement Learning with Incomplete Prior Information

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0721 ◽

2013 ◽

Vol 17 (5) ◽

pp. 721-730 ◽

Cited By ~ 1

Author(s):

Kento Terashima ◽

◽

Hirotaka Takano ◽

Junichi Murata

Keyword(s):

Reinforcement Learning ◽

Prior Information ◽

Search Process ◽

Trial And Error ◽

Calculation Time ◽

Learning Speed ◽

Incomplete Prior Information

Reinforcement learning is applicable to complex or unknown problems because the solution search process is done by trial-and-error. However, the calculation time for the trial-and-error search becomes larger as the scale of the problem increases. Therefore, in order to decrease calculation time, some methods have been proposed using the prior information on the problem. This paper improves a previously proposed method utilizing options as prior information. In order to increase the learning speed even with wrong options, methods for option correction by forgetting the policy and extending initiation sets are proposed.

Download Full-text

Dynamic assembly sequence selection using reinforcement learning

IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 ◽

10.1109/robot.2004.1307458 ◽

2004 ◽

Cited By ~ 6

Author(s):

G. Lowe ◽

B. Shirinzadeh

Keyword(s):

Reinforcement Learning ◽

Assembly Sequence ◽

Sequence Selection ◽

Dynamic Assembly

Download Full-text

Impact of CSI Feedback Strategies on LTE Downlink and Reinforcement Learning Solutions for Optimal Allocation

IEEE Transactions on Vehicular Technology ◽

10.1109/tvt.2016.2531291 ◽

2016 ◽

pp. 1-1 ◽

Cited By ~ 8

Author(s):

Alessandro Chiumento ◽

Claude Desset ◽

Sofie Pollin ◽

Liesbet Van der Perre ◽

Rudy Lauwereins

Keyword(s):

Reinforcement Learning ◽

Optimal Allocation ◽

Feedback Strategies ◽

Csi Feedback

Download Full-text

Maximization of Learning Speed Due to Neuronal Redundancy in Reinforcement Learning

Journal of the Physical Society of Japan ◽

10.7566/jpsj.85.114801 ◽

2016 ◽

Vol 85 (11) ◽

pp. 114801 ◽

Cited By ~ 2

Author(s):

Ken Takiyama

Keyword(s):

Reinforcement Learning ◽

Learning Speed

Download Full-text