Context‐aware pub/sub control method using reinforcement learning

Author(s):  
Joohyun Kim ◽  
Seohee Hong ◽  
Sengphil Hong ◽  
Jaehoon Kim
2021 ◽  
Vol 47 ◽  
pp. 101229
Author(s):  
Dan E. Kröhling ◽  
Omar J.A. Chiotti ◽  
Ernesto C. Martínez

Author(s):  
Qingyuan Zheng ◽  
Duo Wang ◽  
Zhang Chen ◽  
Yiyong Sun ◽  
Bin Liang

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.


2021 ◽  
Vol 143 (6) ◽  
Author(s):  
Eric R. Anderson ◽  
Brian L. Steward

Abstract Hydraulic pressure ripple in a pump, as a result of converting rotational power to fluid power, continues to be a problem faced when developing hydraulic systems due to the resulting noise generated. In this paper, we present simulation results from leveraging an actor-critic reinforcement learning method as the control method for active noise control in a hydraulic system. The results demonstrate greater than 96%, 81%, and 61% pressure ripple reduction for the first, second, and third harmonics, respectively, in a single operating point test, along with the advantage of feed forward like control for high bandwidth response during dynamic changes in the operating point. It also demonstrates the disadvantage of long convergence times while the controller is effectively learning the optimal control policy. Additionally, this work demonstrates the ancillary benefit of the elimination of the injection of white noise for the purpose of system identification in the current state of the art.


Sign in / Sign up

Export Citation Format

Share Document