Implementation of assembly task based on guided policy search algorithm

Author(s):  
Qingwei Dong ◽  
Chuanzhi Zang ◽  
Peng Zeng ◽  
Guangxi Wan ◽  
Yunpeng He ◽  
...  
Author(s):  
Biao Sun ◽  
Fangzhou Xiong ◽  
Zhiyong Liu ◽  
Xu Yang ◽  
Hong Qiao

2007 ◽  
Vol 19 (2) ◽  
pp. 161-174 ◽  
Author(s):  
Jiaqiao Hu ◽  
Michael C. Fu ◽  
Vahid R. Ramezani ◽  
Steven I. Marcus

2020 ◽  
Vol 34 (04) ◽  
pp. 5668-5675
Author(s):  
Lior Shani ◽  
Yonathan Efroni ◽  
Shie Mannor

Trust region policy optimization (TRPO) is a popular and empirically successful policy search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts consecutive policies to be ‘close’ to one another, is iteratively solved. Nevertheless, TRPO has been considered a heuristic algorithm inspired by Conservative Policy Iteration (CPI). We show that the adaptive scaling mechanism used in TRPO is in fact the natural “RL version” of traditional trust-region methods from convex analysis. We first analyze TRPO in the planning setting, in which we have access to the model and the entire state space. Then, we consider sample-based TRPO and establish Õ(1/√N) convergence rate to the global optimum. Importantly, the adaptive scaling mechanism allows us to analyze TRPO in regularized MDPs for which we prove fast rates of Õ(1/N), much like results in convex optimization. This is the first result in RL of better rates when regularizing the instantaneous cost or reward.


2016 ◽  
Vol 35 (14) ◽  
pp. 1760-1778 ◽  
Author(s):  
Christopher Amato ◽  
George Konidaris ◽  
Ariel Anders ◽  
Gabriel Cruz ◽  
Jonathan P How ◽  
...  

We introduce a principled method for multi-robot coordination based on a general model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations. A new MacDec-POMDP planning algorithm is presented that searches over policies represented as finite-state controllers, rather than the previous policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significant performance improvements over previous methods and show that our method can be used for actual multi-robot systems through experiments on a cooperative multi-robot bartending domain.


2009 ◽  
Author(s):  
Sue A. Ferguson ◽  
William S. Marras ◽  
W. Gary Allread ◽  
Gregory G. Knapik ◽  
Kimberly A. Vandlen ◽  
...  

2020 ◽  
Vol 39 (6) ◽  
pp. 8125-8137
Author(s):  
Jackson J Christy ◽  
D Rekha ◽  
V Vijayakumar ◽  
Glaucio H.S. Carvalho

Vehicular Adhoc Networks (VANET) are thought-about as a mainstay in Intelligent Transportation System (ITS). For an efficient vehicular Adhoc network, broadcasting i.e. sharing a safety related message across all vehicles and infrastructure throughout the network is pivotal. Hence an efficient TDMA based MAC protocol for VANETs would serve the purpose of broadcast scheduling. At the same time, high mobility, influential traffic density, and an altering network topology makes it strenuous to form an efficient broadcast schedule. In this paper an evolutionary approach has been chosen to solve the broadcast scheduling problem in VANETs. The paper focusses on identifying an optimal solution with minimal TDMA frames and increased transmissions. These two parameters are the converging factor for the evolutionary algorithms employed. The proposed approach uses an Adaptive Discrete Firefly Algorithm (ADFA) for solving the Broadcast Scheduling Problem (BSP). The results are compared with traditional evolutionary approaches such as Genetic Algorithm and Cuckoo search algorithm. A mathematical analysis to find the probability of achieving a time slot is done using Markov Chain analysis.


2019 ◽  
Vol 2 (3) ◽  
pp. 508-517
Author(s):  
FerdaNur Arıcı ◽  
Ersin Kaya

Optimization is a process to search the most suitable solution for a problem within an acceptable time interval. The algorithms that solve the optimization problems are called as optimization algorithms. In the literature, there are many optimization algorithms with different characteristics. The optimization algorithms can exhibit different behaviors depending on the size, characteristics and complexity of the optimization problem. In this study, six well-known population based optimization algorithms (artificial algae algorithm - AAA, artificial bee colony algorithm - ABC, differential evolution algorithm - DE, genetic algorithm - GA, gravitational search algorithm - GSA and particle swarm optimization - PSO) were used. These six algorithms were performed on the CEC’17 test functions. According to the experimental results, the algorithms were compared and performances of the algorithms were evaluated.


Sign in / Sign up

Export Citation Format

Share Document