Neuroevolution approach for reinforcement learning task applied to control system of a cart pole system

Author(s):  
Marco Boaretto ◽  
Gabriel Chaves Becchi ◽  
Luiza Scapinello Aquino ◽  
Aderson Cleber Pifer ◽  
Helon Vicente Hultmann Ayala ◽  
...  
2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


Processes ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 487
Author(s):  
Fumitake Fujii ◽  
Akinori Kaneishi ◽  
Takafumi Nii ◽  
Ryu’ichiro Maenishi ◽  
Soma Tanaka

Proportional–integral–derivative (PID) control remains the primary choice for industrial process control problems. However, owing to the increased complexity and precision requirement of current industrial processes, a conventional PID controller may provide only unsatisfactory performance, or the determination of PID gains may become quite difficult. To address these issues, studies have suggested the use of reinforcement learning in combination with PID control laws. The present study aims to extend this idea to the control of a multiple-input multiple-output (MIMO) process that suffers from both physical coupling between inputs and a long input/output lag. We specifically target a thin film production process as an example of such a MIMO process and propose a self-tuning two-degree-of-freedom PI controller for the film thickness control problem. Theoretically, the self-tuning functionality of the proposed control system is based on the actor-critic reinforcement learning algorithm. We also propose a method to compensate for the input coupling. Numerical simulations are conducted under several likely scenarios to demonstrate the enhanced control performance relative to that of a conventional static gain PI controller.


2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


2019 ◽  
Author(s):  
Alexandra O. Cohen ◽  
Kate Nussenbaum ◽  
Hayley Dorfman ◽  
Samuel J. Gershman ◽  
Catherine A. Hartley

Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they will update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. The present study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned well with the true probabilities of positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18 - 25) and adolescents (ages 13 - 17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden agent intervention, those of children (ages 7 - 12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.


2021 ◽  
Vol 2113 (1) ◽  
pp. 012030
Author(s):  
Jing Li ◽  
Yanyang Liu ◽  
Xianguo Qing ◽  
Kai Xiao ◽  
Ying Zhang ◽  
...  

Abstract The nuclear reactor control system plays a crucial role in the operation of nuclear power plants. The coordinated control of power control and steam generator level control has become one of the most important control problems in these systems. In this paper, we propose a mathematical model of the coordinated control system, and then transform it into a reinforcement learning model and develop a deep reinforcement learning control algorithm so-called DDPG algorithm to solve the problem. Through simulation experiments, our proposed algorithm has shown an extremely remarkable control performance.


Sign in / Sign up

Export Citation Format

Share Document