scholarly journals A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Science ◽  
2018 ◽  
Vol 362 (6419) ◽  
pp. 1140-1144 ◽  
Author(s):  
David Silver ◽  
Thomas Hubert ◽  
Julian Schrittwieser ◽  
Ioannis Antonoglou ◽  
Matthew Lai ◽  
...  

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

2021 ◽  
pp. 1-11
Author(s):  
Yang Yang

In order to improve the effect of sports movement training, this paper builds a sports movement training model based on artificial intelligence technology based on the generation of confrontation network model. Moreover, in order to achieve the combination of model and model-free deep reinforcement learning algorithm, this paper implements the model’s guidance and constraints on deep reinforcement learning algorithm from the perspective of reward value and behavior strategy and divides the model into two situations. In one case, the existing or manually established expert rules are used as model constraints, which is equivalent to online learning by experts. In another case, expert samples are used as model constraints, and an imitation learning method based on generative adversarial networks is introduced. Moreover, using expert samples as training data, the mechanism that the model is guided by the reward value is combined with the model-free algorithm by generating a confrontation network structure. Finally, this paper studies the performance of the model through experimental research. The research results show that the model constructed in this paper has a certain effect.


2021 ◽  
Vol 54 (3-4) ◽  
pp. 417-428
Author(s):  
Yanyan Dai ◽  
KiDong Lee ◽  
SukGyu Lee

For real applications, rotary inverted pendulum systems have been known as the basic model in nonlinear control systems. If researchers have no deep understanding of control, it is difficult to control a rotary inverted pendulum platform using classic control engineering models, as shown in section 2.1. Therefore, without classic control theory, this paper controls the platform by training and testing reinforcement learning algorithm. Many recent achievements in reinforcement learning (RL) have become possible, but there is a lack of research to quickly test high-frequency RL algorithms using real hardware environment. In this paper, we propose a real-time Hardware-in-the-loop (HIL) control system to train and test the deep reinforcement learning algorithm from simulation to real hardware implementation. The Double Deep Q-Network (DDQN) with prioritized experience replay reinforcement learning algorithm, without a deep understanding of classical control engineering, is used to implement the agent. For the real experiment, to swing up the rotary inverted pendulum and make the pendulum smoothly move, we define 21 actions to swing up and balance the pendulum. Comparing Deep Q-Network (DQN), the DDQN with prioritized experience replay algorithm removes the overestimate of Q value and decreases the training time. Finally, this paper shows the experiment results with comparisons of classic control theory and different reinforcement learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document