Improving Maneuver Strategy in Air Combat by Alternate Freeze Games with a Deep Reinforcement Learning Algorithm

In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.

Download Full-text

Infrared Air Combat Simulation Model for Deep Reinforcement Learning

10.1145/3487075.3487094 ◽

2021 ◽

Author(s):

Conghui Huang ◽

Chaozhe Wang ◽

Qi Tong

Keyword(s):

Reinforcement Learning ◽

Simulation Model ◽

Combat Simulation ◽

Air Combat

Download Full-text

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/317 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peng Zhang ◽

Jianye Hao ◽

Weixun Wang ◽

Hongyao Tang ◽

Yi Ma ◽

...

Keyword(s):

Reinforcement Learning ◽

Prior Knowledge ◽

Learning Process ◽

Learning Algorithm ◽

Fuzzy Rule ◽

Policy Network ◽

Human Knowledge ◽

Learning Agents ◽

The Common ◽

Low Performance

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

Download Full-text

Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Electronics ◽

10.3390/electronics7110279 ◽

2018 ◽

Vol 7 (11) ◽

pp. 279 ◽

Cited By ~ 6

Author(s):

Xianbing Zhang ◽

Guoqing Liu ◽

Chaojie Yang ◽

Jiang Wu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Signal Design ◽

Training Environment ◽

Strategy Space ◽

Intelligent Decision Making ◽

Combat Training ◽

Network Method ◽

Air Combat

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.

Download Full-text

Learning to Teach Reinforcement Learning Agents

Machine Learning and Knowledge Extraction ◽

10.3390/make1010002 ◽

2017 ◽

Vol 1 (1) ◽

pp. 21-42 ◽

Cited By ~ 9

Author(s):

Anestis Fachantidis ◽

Matthew Taylor ◽

Ioannis Vlahavas

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Relevant Literature ◽

Learning Approaches ◽

Training Time ◽

Factors Affecting ◽

Learning Agents ◽

Best Teachers ◽

Previous Learning ◽

Advice Quality

In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.

Download Full-text

Optimal Signal Timing of Single Intersection for Traffic Emission Control

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.587-589.2137 ◽

2014 ◽

Vol 587-589 ◽

pp. 2137-2140

Author(s):

Xin Li ◽

Feng Chen

Keyword(s):

Reinforcement Learning ◽

Traffic Control ◽

Learning Algorithm ◽

Emission Control ◽

Simulation Software ◽

Atmospheric Environment ◽

Traffic Signal Control ◽

Traffic Emission ◽

Average Delay ◽

Control Scheme

Traffic emission is one of the main pollution sources of urban atmospheric environment. Traffic control scheme of intersection has important influence on vehicle emission. Research on low emission traffic signal control scheme has become one of focuses of Intelligent Transportation. Current typical control methods of traffic emission are based on optimizing the average delay and number of stops. However, it is extremely difficult to use mathematical formula to calculate the delay and the number of stops in the presence of initial queue length of intersection. In order to solve this problem, we proposed a traffic emission control algorithm based on reinforcement learning. The simulation experiments were carried out by using the microscopic traffic simulation software. Compared with the Hideki emission control scheme, the experimental results show that the reinforcement learning algorithm is more effective. The average vehicle emissions are reduced by 12.2% for high saturation of the intersection.

Download Full-text

BUILDING AN ARTIFICIAL STOCK MARKET POPULATED BY REINFORCEMENT‐LEARNING AGENTS

Journal of Business Economics and Management ◽

10.3846/1611-1699.2009.10.329-341 ◽

2009 ◽

Vol 10 (4) ◽

pp. 329-341 ◽

Cited By ~ 10

Author(s):

Aleksandras Vytautas Rutkauskas ◽

Tomas Ramanauskas

Keyword(s):

Reinforcement Learning ◽

Stock Market ◽

Learning Algorithm ◽

Self Regulation ◽

Market Model ◽

Emergent Properties ◽

Q Learning ◽

Evolutionary Selection ◽

Learning Agents ◽

Artificial Stock Market

In this paper we propose an artificial stock market model based on interaction of heterogeneous agents whose forward-looking behaviour is driven by the reinforcement-learning algorithm combined with some evolutionary selection mechanism. We use the model for the analysis of market self-regulation abilities, market efficiency and determinants of emergent properties of the financial market. Distinctive and novel features of the model include strong emphasis on the economic content of individual decision-making, application of the Q-learning algorithm for driving individual behaviour, and rich market setup. Along with that a parallel version of the model is presented, which is mainly based on research of current changes in the market, as well as on search of newly emerged consistent patterns, and which has been repeatedly used for optimal decisions’ search experiments in various capital markets.

Download Full-text

CALCULATION OF MANIPULATOR EXPERIMENTAL MODEL FOR STUDYING THE METHODS OF LEARNING (WITH REINFORCEMENT)

Forestry Engineering Journal ◽

10.34220/issn.2222-7962/2021.1/13 ◽

2021 ◽

Vol 11 (1) ◽

pp. 147-154

Author(s):

Dmitriy Stupnikov ◽

Andrey Tolstyh ◽

Sergey Malyukov ◽

Aleksey Aksenov ◽

Sergey Novikov

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Safety Margin ◽

Robotic Manipulator ◽

Simulation Software ◽

Design Features ◽

System Use ◽

Manipulator Arm ◽

Methods Of Learning ◽

Final System

Reinforcement learning is a type of machine learning algorithm. These algorithms interact with the model of the environment in which the robotic system is supposed to be used, and make it possible to obtain relatively simple approximations of effective sets of system actions to achieve the set goal. The use of reinforcement learning will allow training the model on server hardware, and in the final system use already trained neural networks, the complexity of calculating the response of which directly depends on their topology. In the presented work, a statistical calculation of a prototype of a robotic manipulator for bench research of reinforcement learning systems has been carried out. The choice of design features and materials has been substantiated; the main units and design features have been considered. The studies were carried out in the SolidWorks Simulation software. A prototype of a robotic manipulator with a sufficiently high safety margin was obtained. It is concluded that the main stress concentrator is the junction of the eyelet and the platform, however, the maximum stress value was 38.804 kgf/sm2, which is insignificant. In this case, the maximum resulting movement will be concentrated in the upper part of the eyelet, and will shift depending on the position of the manipulator arm. The maximum recorded displacement is 0.073 mm, which is negligible

Download Full-text

Research on Tensor-Based Cooperative and Competitive in Multi-Agent Reinforcement Learning

European Journal of Electrical Engineering and Computer Science ◽

10.24018/ejece.2020.4.6.262 ◽

2020 ◽

Vol 4 (6) ◽

Author(s):

Tsega Weldu Araya ◽

Md Rashed Ibn Nawab ◽

A. P. Yuan Ling

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Data Representation ◽

Training Data ◽

Two Dimensional ◽

Multiple Agents ◽

Learning Agents ◽

Dimensional Array ◽

Multi Agent ◽

Agent Cooperation

As technology overgrows, the assortment of information and the density of work becomes demanding to manage. To resolve the density of employment and human labor, machine-learning (ML) technology developed. Reinforcement learning (RL) is the recent advancement of ML studies. Multi-agent reinforcement learning (MARL) is useful to train multiple agents in the surrounding environment. The previous research studies focused on two-agent cooperation. Their data representation was held in a two-dimensional array, which is called a matrix. The limitation of this two-dimensional array appears as the training data of agents increases. The growth in the training data of agents creates storage drawbacks and data redundancy. Our first aim in this research is to improve an algorithm that can represent MARL training in tensor. In MARL, multiple agents are work together to achieve joint work. To share the training records and data of numerous agents, we need to collect the previous cumulative experience of agents in tensor. Secondly, we will discover the agent's cooperation and competition, with local and global goals of agents in MARL. Local goals are the cooperation of agents in a group or team where we use the training model as a student and teacher agent. The global goal is the competition between two contrary teams to acquire the reward. All learning agents have their Q table for storing the individual agent's training data in an environment. The growth in the number of learning agents, their training experience in Q tables, and the requirement for representing multiple data become the most challenging issue. We introduce tensor to store various data to resolve the challenges for data representation in multiple agent associations. Tensor is expressed as the three-dimensional array, although it is an N-way array, which is useful for representing and accessing numerous data. Finally, we will implement an algorithm for learning three cooperative agents against the opposed team using a tensor-based framework in the Q learning algorithm. We will provide an algorithm that can store the training records and data of multiple agents. Tensor advances to get a small storage size than the matrix for the training records of agents. Although three agent cooperation benefits to having maximum optimal reward.

Download Full-text

Inconsistency-Induced Learning for Perpetual Learners

Advances in Abstract Intelligence and Soft Computing ◽

10.4018/978-1-4666-2651-5.ch006 ◽

2013 ◽

pp. 70-87 ◽

Cited By ~ 1

Author(s):

Du Zhang ◽

Meiliu Lu

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Target Function ◽

Data Sets ◽

Revision Process ◽

Learning Agents ◽

Learning Agent ◽

The One ◽

Over Time

One of the long-term research goals in machine learning is how to build never-ending learners. The state-of-the-practice in the field of machine learning thus far is still dominated by the one-time learner paradigm: some learning algorithm is utilized on data sets to produce certain model or target function, and then the learner is put away and the model or function is put to work. Such a learn-once-apply-next (or LOAN) approach may not be adequate in dealing with many real world problems and is in sharp contrast with the human’s lifelong learning process. On the other hand, learning can often be brought on through overcoming some inconsistent circumstances. This paper proposes a framework for perpetual learning agents that are capable of continuously refining or augmenting their knowledge through overcoming inconsistencies encountered during their problem-solving episodes. The never-ending nature of a perpetual learning agent is embodied in the framework as the agent’s continuous inconsistency-induced belief revision process. The framework hinges on the agents recognizing inconsistency in data, information, knowledge, or meta-knowledge, identifying the cause of inconsistency, revising or augmenting beliefs to explain, resolve, or accommodate inconsistency. The authors believe that inconsistency can serve as one of the important learning stimuli toward building perpetual learning agents that incrementally improve their performance over time.

Download Full-text