Actor–critic-based decision-making method for the artificial intelligence commander in tactical wargames

Author(s):  
Junfeng Zhang ◽  
Qing Xue

In a tactical wargame, the decisions of the artificial intelligence (AI) commander are critical to the final combat result. Due to the existence of fog-of-war, AI commanders are faced with unknown and invisible information on the battlefield and lack of understanding of the situation, and it is difficult to make appropriate tactical strategies. The traditional knowledge rule-based decision-making method lacks flexibility and autonomy. How to make flexible and autonomous decision-making when facing complex battlefield situations is a difficult problem. This paper aims to solve the decision-making problem of the AI commander by using the deep reinforcement learning (DRL) method. We develop a tactical wargame as the research environment, which contains built-in script AI and supports the machine–machine combat mode. On this basis, an end-to-end actor–critic framework for commander decision making based on the convolutional neural network is designed to represent the battlefield situation and the reinforcement learning method is used to try different tactical strategies. Finally, we carry out a combat experiment between a DRL-based agent and a rule-based agent in a jungle terrain scenario. The result shows that the AI commander who adopts the actor–critic method successfully learns how to get a higher score in the tactical wargame, and the DRL-based agent has a higher winning ratio than the rule-based agent.

2019 ◽  
Vol 1 (2) ◽  
pp. 74-84
Author(s):  
Evan Kusuma Susanto ◽  
Yosi Kristian

Asynchronous Advantage Actor-Critic (A3C) adalah sebuah algoritma deep reinforcement learning yang dikembangkan oleh Google DeepMind. Algoritma ini dapat digunakan untuk menciptakan sebuah arsitektur artificial intelligence yang dapat menguasai berbagai jenis game yang berbeda melalui trial and error dengan mempelajari tempilan layar game dan skor yang diperoleh dari hasil tindakannya tanpa campur tangan manusia. Sebuah network A3C terdiri dari Convolutional Neural Network (CNN) di bagian depan, Long Short-Term Memory Network (LSTM) di tengah, dan sebuah Actor-Critic network di bagian belakang. CNN berguna sebagai perangkum dari citra output layar dengan mengekstrak fitur-fitur yang penting yang terdapat pada layar. LSTM berguna sebagai pengingat keadaan game sebelumnya. Actor-Critic Network berguna untuk menentukan tindakan terbaik untuk dilakukan ketika dihadapkan dengan suatu kondisi tertentu. Dari hasil percobaan yang dilakukan, metode ini cukup efektif dan dapat mengalahkan pemain pemula dalam memainkan 5 game yang digunakan sebagai bahan uji coba.


2013 ◽  
Vol 380-384 ◽  
pp. 1354-1357
Author(s):  
Ya Ni Zhang

The complicated decision making problem is one of the important components for the study on the system of artificial intelligence area. This thesis, based on the Bayesian technology and decision-making theory, is going to optimize the traditional IDs model and improve the ability of expression of the model. and also by using the sum of individual utility function instead of the joint utility function to create the BP neural network to study the utility function structure of the IDs. The experimental result shows the method mentioned above is effective.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Baolai Wang ◽  
Shengang Li ◽  
Xianzhong Gao ◽  
Tao Xie

With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers’ attention. However, the situation faced by the UAV swarm has substantial uncertainty and dynamic variability. The state space and action space increase exponentially with the number of UAVs, so that autonomous decision-making becomes a difficult problem in the confrontation environment. In this paper, a multiagent reinforcement learning method with macro action and human expertise is proposed for autonomous decision-making of UAVs. In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov decision process. Agents in the proposed method are trained based on the macro actions, where sparse and delayed rewards, large state space, and action space are effectively overcome. The key to the success of this method is the generation of the macro actions that allow the high-level policy to find a near-optimal solution. In this paper, we further leverage human expertise to design a set of good macro actions. Extensive empirical experiments in our constructed swarm confrontation environment show that our method performs better than the other algorithms.


2021 ◽  
pp. 1-15
Author(s):  
Qinyu Mei ◽  
Ming Li

Aiming at the construction of the decision-making system for sports-assisted teaching and training, this article first gives a deep convolutional neural network model for sports-assisted teaching and training decision-making. Subsequently, In order to meet the needs of athletes to assist in physical exercise, a squat training robot is built using a self-developed modular flexible cable drive unit, and its control system is designed to assist athletes in squatting training in sports. First, the human squat training mechanism is analyzed, and the overall structure of the robot is determined; second, the robot force servo control strategy is designed, including the flexible cable traction force planning link, the lateral force compensation link and the establishment of a single flexible cable passive force controller; In order to verify the effect of robot training, a single flexible cable force control experiment and a man-machine squat training experiment were carried out. In the single flexible cable force control experiment, the suppression effect of excess force reached more than 50%. In the squat experiment under 200 N, the standard deviation of the system loading force is 7.52 N, and the dynamic accuracy is above 90.2%. Experimental results show that the robot has a reasonable configuration, small footprint, stable control system, high loading accuracy, and can assist in squat training in physical education.


2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


Biomolecules ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 264
Author(s):  
Kaisa Liimatainen ◽  
Riku Huttunen ◽  
Leena Latonen ◽  
Pekka Ruusuvuori

Identifying localization of proteins and their specific subpopulations associated with certain cellular compartments is crucial for understanding protein function and interactions with other macromolecules. Fluorescence microscopy is a powerful method to assess protein localizations, with increasing demand of automated high throughput analysis methods to supplement the technical advancements in high throughput imaging. Here, we study the applicability of deep neural network-based artificial intelligence in classification of protein localization in 13 cellular subcompartments. We use deep learning-based on convolutional neural network and fully convolutional network with similar architectures for the classification task, aiming at achieving accurate classification, but importantly, also comparison of the networks. Our results show that both types of convolutional neural networks perform well in protein localization classification tasks for major cellular organelles. Yet, in this study, the fully convolutional network outperforms the convolutional neural network in classification of images with multiple simultaneous protein localizations. We find that the fully convolutional network, using output visualizing the identified localizations, is a very useful tool for systematic protein localization assessment.


Sign in / Sign up

Export Citation Format

Share Document