scholarly journals Accelerating Interactive Reinforcement Learning by Human Advice for an Assembly Task by a Cobot

Robotics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 104
Author(s):  
Joris De Winter ◽  
Albert De Beir ◽  
Ilias El Makrini ◽  
Greet Van de Perre ◽  
Ann Nowé ◽  
...  

The assembly industry is shifting more towards customizable products, or requiring assembly of small batches. This requires a lot of reprogramming, which is expensive because a specialized engineer is required. It would be an improvement if untrained workers could help a cobot to learn an assembly sequence by giving advice. Learning an assembly sequence is a hard task for a cobot, because the solution space increases drastically when the complexity of the task increases. This work introduces a novel method where human knowledge is used to reduce this solution space, and as a result increases the learning speed. The method proposed is the IRL-PBRS method, which uses Interactive Reinforcement Learning (IRL) to learn from human advice in an interactive way, and uses Potential Based Reward Shaping (PBRS), in a simulated environment, to focus learning on a smaller part of the solution space. The method was compared in simulation to two other feedback strategies. The results show that IRL-PBRS converges more quickly to a valid assembly sequence policy and does this with the fewest human interactions. Finally, a use case is presented where participants were asked to program an assembly task. Here, the results show that IRL-PBRS learns quickly enough to keep up with advice given by a user, and is able to adapt online to a changing knowledge base.

Author(s):  
Patrick Mannion ◽  
Sam Devlin ◽  
Jim Duggan ◽  
Enda Howley

AbstractThe majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.


Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 1032
Author(s):  
Hyoungsik Nam ◽  
Young In Kim ◽  
Jina Bae ◽  
Junhee Lee

This paper proposes a GateRL that is an automated circuit design framework of CMOS logic gates based on reinforcement learning. Because there are constraints in the connection of circuit elements, the action masking scheme is employed. It also reduces the size of the action space leading to the improvement on the learning speed. The GateRL consists of an agent for the action and an environment for state, mask, and reward. State and reward are generated from a connection matrix that describes the current circuit configuration, and the mask is obtained from a masking matrix based on constraints and current connection matrix. The action is given rise to by the deep Q-network of 4 fully connected network layers in the agent. In particular, separate replay buffers are devised for success transitions and failure transitions to expedite the training process. The proposed network is trained with 2 inputs, 1 output, 2 NMOS transistors, and 2 PMOS transistors to design all the target logic gates, such as buffer, inverter, AND, OR, NAND, and NOR. Consequently, the GateRL outputs one-transistor buffer, two-transistor inverter, two-transistor AND, two-transistor OR, three-transistor NAND, and three-transistor NOR. The operations of these resultant logics are verified by the SPICE simulation.


2019 ◽  
Vol 9 (3) ◽  
pp. 502 ◽  
Author(s):  
Cristyan Gil ◽  
Hiram Calvo ◽  
Humberto Sossa

Programming robots for performing different activities requires calculating sequences of values of their joints by taking into account many factors, such as stability and efficiency, at the same time. Particularly for walking, state of the art techniques to approximate these sequences are based on reinforcement learning (RL). In this work we propose a multi-level system, where the same RL method is used first to learn the configuration of robot joints (poses) that allow it to stand with stability, and then in the second level, we find the sequence of poses that let it reach the furthest distance in the shortest time, while avoiding falling down and keeping a straight path. In order to evaluate this, we focus on measuring the time it takes for the robot to travel a certain distance. To our knowledge, this is the first work focusing both on speed and precision of the trajectory at the same time. We implement our model in a simulated environment using q-learning. We compare with the built-in walking modes of an NAO robot by improving normal-speed and enhancing robustness in fast-speed. The proposed model can be extended to other tasks and is independent of a particular robot model.


2019 ◽  
Vol 40 (2) ◽  
pp. 361-375 ◽  
Author(s):  
Nan Zhang ◽  
Zhenyu Liu ◽  
Chan Qiu ◽  
Weifei Hu ◽  
Jianrong Tan

Purpose Assembly sequence planning (ASP) plays a vital role in assembly process because it directly influences the feasibility, cost and time of the assembly process. The purpose of this study is to solve ASP problem more efficiently than current algorithms. Design/methodology/approach A novel assembly subsets prediction method based on precedence graph is proposed to solve the ASP problem. The proposed method adopts the idea of local to whole and integrates a simplified firework algorithm. First, assembly subsets are generated as initial fireworks. Then, each firework explodes to several sparks with higher-level assembly subsets and new fireworks are selected for next generation according to selection strategy. Finally, iterating the algorithm until complete and feasible solutions are generated. Findings The proposed method performs better in comparison with state-of-the-art algorithms because of the balance of exploration (fireworks) and exploitation (sparks). The size of initial fireworks population determines the diversity of the solution, so assembly subsets prediction method based on precedence graph (ASPM-PG) can explore the solution space. The size of sparks controls the exploitation ability of ASPM-PG; with more sparks, the direction of a specific firework can be adequately exploited. Practical implications The proposed method is with simple structure and high efficiency. It is anticipated that using the proposed method can effectively improve the efficiency of ASP and reduce computing cost for industrial applications. Originality/value The proposed method finds the optimal sequence in the construction process of assembly sequence rather than adjusting order of a complete assembly sequence in traditional methods. Moreover, a simplified firework algorithm with new operators is introduced. Two basic size parameters are also analyzed to explain the proposed method.


Author(s):  
Kento Terashima ◽  
◽  
Hirotaka Takano ◽  
Junichi Murata

Reinforcement learning is applicable to complex or unknown problems because the solution search process is done by trial-and-error. However, the calculation time for the trial-and-error search becomes larger as the scale of the problem increases. Therefore, in order to decrease calculation time, some methods have been proposed using the prior information on the problem. This paper improves a previously proposed method utilizing options as prior information. In order to increase the learning speed even with wrong options, methods for option correction by forgetting the policy and extending initiation sets are proposed.


Author(s):  
Alessandro Chiumento ◽  
Claude Desset ◽  
Sofie Pollin ◽  
Liesbet Van der Perre ◽  
Rudy Lauwereins

Sign in / Sign up

Export Citation Format

Share Document