Vague Neural Network Based Reinforcement Learning Control System for Inverted Pendulum

Author(s):  
Yibiao Zhao ◽  
Siwei Luo ◽  
Liang Wang ◽  
Aidong Ma ◽  
Rui Fang
2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


1995 ◽  
Vol 13 (7) ◽  
pp. 1006-1019 ◽  
Author(s):  
Teruo Fujii ◽  
Tamaki Ura ◽  
Taku Sutoh ◽  
Kazuo Ishii

2013 ◽  
Vol 13 (02) ◽  
pp. 1350040
Author(s):  
EHSAN TAHAMI ◽  
AMIR HOMAYOUN JAFARI ◽  
ALI FALLAH

In this paper, the control of a planar three-link musculoskeletal arm by using a revolutionary actor–critic reinforcement learning (RL) method during a reaching movement to a stationary target is presented. The arm model used in this study included three skeletal links (wrist, forearm, and upper arm), three joints (wrist, elbow, and shoulder without redundancy), and six non-linear monoarticular muscles (with redundancy), which were based on the Hill model. The learning control system was composed of actor, critic, and genetic algorithm (GA) parts. Two single-layer neural networks were used for each part of the actor and critic. This learning control system was used to apply six activation commands to six monoarticular muscles at each instant of time. It also used a reinforcement (reward) feedback for the learning process and controlling the direction of arm movement. Also, the GA was implemented to select the best learning rates for actor–critic neural networks. The results showed that mean square error (MSE) and average episode time gradually decrease and average reward gradually increases to constant values during the learning of the control policy. Furthermore, when learning was complete, optimal values of learning rates were selected.


1993 ◽  
Vol 5 (6) ◽  
pp. 542-547
Author(s):  
Yasuo Kurematsu ◽  
◽  
Takashi Murai ◽  
Takuji Maeda ◽  
Shinzo Kitamura ◽  
...  

The authors are studying the autonomous walking trajectory generation of a biped locomotive robot using the system consisting of an inverted pendulum equation and neural networks. This paper uses the trajectory generation system to simulate and to verify how the robot reacts to a change in its initial posture or the initial weight coefficient of a multi-layered neural network or an addition of disturbances during walking. The simulation test showed that the initial posture of the robot mainly determined a success in walking as well as a gait and that some disturbances did not prevent the robot from walking.


2020 ◽  
Vol 2020 (4) ◽  
pp. 43-54
Author(s):  
S.V. Khoroshylov ◽  
◽  
M.O. Redka ◽  

The aim of the article is to approximate optimal relative control of an underactuated spacecraft using reinforcement learning and to study the influence of various factors on the quality of such a solution. In the course of this study, methods of theoretical mechanics, control theory, stability theory, machine learning, and computer modeling were used. The problem of in-plane spacecraft relative control using only control actions applied tangentially to the orbit is considered. This approach makes it possible to reduce the propellant consumption of reactive actuators and to simplify the architecture of the control system. However, in some cases, methods of the classical control theory do not allow one to obtain acceptable results. In this regard, the possibility of solving this problem by reinforcement learning methods has been investigated, which allows designers to find control algorithms close to optimal ones as a result of interactions of the control system with the plant using a reinforcement signal characterizing the quality of control actions. The well-known quadratic criterion is used as a reinforcement signal, which makes it possible to take into account both the accuracy requirements and the control costs. A search for control actions based on reinforcement learning is made using the policy iteration algorithm. This algorithm is implemented using the actor–critic architecture. Various representations of the actor for control law implementation and the critic for obtaining value function estimates using neural network approximators are considered. It is shown that the optimal control approximation accuracy depends on a number of features, namely, an appropriate structure of the approximators, the neural network parameter updating method, and the learning algorithm parameters. The investigated approach makes it possible to solve the considered class of control problems for controllers of different structures. Moreover, the approach allows the control system to refine its control algorithms during the spacecraft operation.


Sign in / Sign up

Export Citation Format

Share Document