Evolving board evaluation fuctions for a complex strategy game

2021 ◽  
Author(s):  
Lisa Patricia Anthony
Keyword(s):  
BMJ ◽  
2011 ◽  
Vol 343 (nov15 2) ◽  
pp. d7379-d7379
Author(s):  
R. O'Conor
Keyword(s):  

2021 ◽  
Author(s):  
Alexander Dockhorn ◽  
Jorge Hurtado-Grueso ◽  
Dominik Jeurissen ◽  
Linjie Xu ◽  
Diego Perez-Liebana

2004 ◽  
Vol 7 (4) ◽  
pp. 404-410 ◽  
Author(s):  
Dominic J Barraclough ◽  
Michelle L Conroy ◽  
Daeyeol Lee

Author(s):  
Cong Fei ◽  
Bin Wang ◽  
Yuzheng Zhuang ◽  
Zongzhang Zhang ◽  
Jianye Hao ◽  
...  

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand for a proper understanding of human drivers' behavior. In this paper, we propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary selector. We provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Huale Li ◽  
Rui Cao ◽  
Xuan Wang ◽  
Xiaohan Hou ◽  
Tao Qian ◽  
...  

In recent years, deep reinforcement learning (DRL) achieves great success in many fields, especially in the field of games, such as AlphaGo, AlphaZero, and AlphaStar. However, due to the reward sparsity problem, the traditional DRL-based method shows limited performance in 3D games, which contain much higher dimension of state space. To solve this problem, in this paper, we propose an intrinsic-based policy optimization (IBPO) algorithm for reward sparsity. In the IBPO, a novel intrinsic reward is integrated into the value network, which provides an additional reward in the environment with sparse reward, so as to accelerate the training. Besides, to deal with the problem of value estimation bias, we further design three types of auxiliary tasks, which can evaluate the state value and the action more accurately in 3D scenes. Finally, a framework of auxiliary intrinsic-based policy optimization (AIBPO) is proposed, which improves the performance of the IBPO. The experimental results show that the method is able to deal with the reward sparsity problem effectively. Therefore, the proposed method may be applied to real-world scenarios, such as 3-dimensional navigation and automatic driving, which can improve the sample utilization to reduce the cost of interactive sample collected by the real equipment.


Sign in / Sign up

Export Citation Format

Share Document