Online control policy optimization for minimizing map uncertainty during exploration

Author(s):  
R. Sim ◽  
G. Dudek ◽  
N. Roy
Author(s):  
Pei-Hua Huang ◽  
◽  
Osamu Hasegawa

This study presents an aerial robotic application of deep reinforcement learning that imparts an asynchronous learning framework and trust region policy optimization to a simulated quad-rotor helicopter (quadcopter) environment. In particular, we optimized a control policy asynchronously through interaction with concurrent instances of the environment. The control system was benchmarked and extended with examples to tackle continuous state-action tasks for the quadcoptor: hovering control and balancing an inverted pole. Performing these maneuvers required continuous actions for sensitive control of small acceleration changes of the quadcoptor, thereby maximizing the scalar reward of the defined tasks. The simulation results demonstrated an enhancement of the learning speed and reliability for the tasks.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Francisco Jesús Arjonilla García ◽  
Yuichi Kobayashi

Purpose This study aims to propose an offline exploratory method that consists of two stages: first, the authors focus on completing the kinematics model of the system by analyzing the Jacobians in the vicinity of the starting point and deducing a virtual input to effectively navigate the system along the non-holonomic constraint. Second, the authors explore the sensorimotor space in a predetermined pattern and obtain an approximate mapping from sensor space to chained form that facilitates controllability. Design/methodology/approach In this paper, the authors tackle the controller acquisition problem of unknown sensorimotor model in non-holonomic driftless systems. This feature is interesting to simplify and speed up the process of setting up industrial mobile robots with feedback controllers. Findings The authors validate the approach for the test case of the unicycle by controlling the system with time-state control policy. The authors present simulated and experimental results that show the effectiveness of the proposed method, and a comparison with the proximal policy optimization algorithm. Originality/value This research indicates clearly that feedback control of non-holonomic systems with uncertain kinematics and unknown sensor configuration is possible.


2021 ◽  
Vol 11 (14) ◽  
pp. 6494
Author(s):  
Waldemar Małopolski ◽  
Jerzy Zając

Based on the novel structural online control policy (SOCP) deadlock handling method presented in our previous work, we have shown that for a specific group of use cases it is possible to relax the requirements of the method, providing an improvement in its performance. In the present work, a new type of deadlock-free zone was introduced which enabled the method to improve its efficiency for both bidirectional as well as unidirectional and mixed path systems. For bidirectional systems, a beneficial outcome was obtained by approaching the global problem solution using sequentially solved local problems. For unidirectional and mixed systems, on the other hand, this paper introduces a condition that allows verification of the feasibility of performing process reservations in a staged manner. The fulfillment of this condition means that there is a possibility of obtaining higher efficiency of the transportation system. The effectiveness of the proposed approaches has been verified by simulations. Their results were compared with the results of the original method resulting in a significant improvement.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1359 ◽  
Author(s):  
Hyun-Kyo Lim ◽  
Ju-Bong Kim ◽  
Joo-Seong Heo ◽  
Youn-Hee Han

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.


Sign in / Sign up

Export Citation Format

Share Document