Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target

AbstractIn missile guidance, pursuit performance is seriously degraded due to the uncertainty and randomness in target maneuverability, detection delay, and environmental noise. In many methods, accurately estimating the acceleration of the target or the time-to-go is needed to intercept the maneuvering target, which is hard in an environment with uncertainty. In this paper, we propose an assisted deep reinforcement learning (ARL) algorithm to optimize the neural network-based missile guidance controller for head-on interception. Based on the relative velocity, distance, and angle, ARL can control the missile to intercept the maneuvering target and achieve large terminal intercept angle. To reduce the influence of environmental uncertainty, ARL predicts the target’s acceleration as an auxiliary supervised task. The supervised learning task improves the ability of the agent to extract information from observations. To exploit the agent’s good trajectories, ARL presents the Gaussian self-imitation learning to make the mean of action distribution approach the agent’s good actions. Compared with vanilla self-imitation learning, Gaussian self-imitation learning improves the exploration in continuous control. Simulation results validate that ARL outperforms traditional methods and proximal policy optimization algorithm with higher hit rate and larger terminal intercept angle in the simulation environment with noise, delay, and maneuverable target.

Download Full-text

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Journal of Computer Sciences Institute ◽

10.35784/jcsi.2680 ◽

2021 ◽

Vol 20 ◽

pp. 197-204

Author(s):

Karina Litwynenko ◽

Małgorzata Plechawska-Wójcik

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Learning Performance ◽

Imitation Learning ◽

Policy Optimization

Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.

Download Full-text

Proximal Policy Optimization Through a Deep Reinforcement Learning Framework for Multiple Autonomous Vehicles at a Non-Signalized Intersection

Applied Sciences ◽

10.3390/app10165722 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5722 ◽

Cited By ~ 1

Author(s):

Duy Quang Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Signalized Intersection ◽

Continuous Control ◽

Simulation Performance ◽

Perceptron Algorithm ◽

Learning Framework ◽

Positive Effects ◽

Policy Optimization

Advanced deep reinforcement learning shows promise as an approach to addressing continuous control tasks, especially in mixed-autonomy traffic. In this study, we present a deep reinforcement-learning-based model that considers the effectiveness of leading autonomous vehicles in mixed-autonomy traffic at a non-signalized intersection. This model integrates the Flow framework, the simulation of urban mobility simulator, and a reinforcement learning library. We also propose a set of proximal policy optimization hyperparameters to obtain reliable simulation performance. First, the leading autonomous vehicles at the non-signalized intersection are considered with varying autonomous vehicle penetration rates that range from 10% to 100% in 10% increments. Second, the proximal policy optimization hyperparameters are input into the multiple perceptron algorithm for the leading autonomous vehicle experiment. Finally, the superiority of the proposed model is evaluated using all human-driven vehicle and leading human-driven vehicle experiments. We demonstrate that full-autonomy traffic can improve the average speed and delay time by 1.38 times and 2.55 times, respectively, compared with all human-driven vehicle experiments. Our proposed method generates more positive effects when the autonomous vehicle penetration rate increases. Additionally, the leading autonomous vehicle experiment can be used to dissipate the stop-and-go waves at a non-signalized intersection.

Download Full-text

Reinforcement Learning-Based Satellite Attitude Stabilization Method for Non-Cooperative Target Capturing

Sensors ◽

10.3390/s18124331 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4331 ◽

Cited By ~ 3

Author(s):

Zhong Ma ◽

Yuejiao Wang ◽

Yidai Yang ◽

Zhuping Wang ◽

Lei Tang ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Network Model ◽

Neural Network Model ◽

Attitude Control ◽

Dynamics Simulation ◽

Continuous Control ◽

Simulation Environment ◽

Control Torque ◽

The Neural Network

When a satellite performs complex tasks such as discarding a payload or capturing a non-cooperative target, it will encounter sudden changes in the attitude and mass parameters, causing unstable flying and rolling of the satellite. In such circumstances, the change of the movement and mass characteristics are unpredictable. Thus, the traditional attitude control methods are unable to stabilize the satellite since they are dependent on the mass parameters of the controlled object. In this paper, we proposed a reinforcement learning method to re-stabilize the attitude of a satellite under such circumstances. Specifically, we discretize the continuous control torque, and build a neural network model that can output the discretized control torque to control the satellite. A dynamics simulation environment of the satellite is built, and the deep Q Network algorithm is then performed to train the neural network in this simulation environment. The reward of the training is the stabilization of the satellite. Simulation experiments illustrate that, with the iteration of training progresses, the neural network model gradually learned to re-stabilize the attitude of a satellite after unknown disturbance. As a contrast, the traditional PD (Proportion Differential) controller was unable to re-stabilize the satellite due to its dependence on the mass parameters. The proposed method adopts self-learning to control satellite attitudes, shows considerable intelligence and certain universality, and has a strong application potential for future intelligent control of satellites performing complex space tasks.

Download Full-text

Deep reinforcement learning based missile guidance law design for maneuvering target interception

10.23919/ccc52363.2021.9549596 ◽

2021 ◽

Author(s):

Mingjian Du ◽

Chi Peng ◽

Jianjun Ma

Keyword(s):

Reinforcement Learning ◽

Missile Guidance ◽

Maneuvering Target ◽

Guidance Law ◽

Target Interception

Download Full-text

Hybrid Reinforcement Learning with Expert State Sequences

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013739 ◽

2019 ◽

Vol 33 ◽

pp. 3739-3746 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Guo ◽

Shiyu Chang ◽

Mo Yu ◽

Gerald Tesauro ◽

Murray Campbell

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Hybrid Approach ◽

Imitation Learning ◽

Learning Approaches ◽

Inference Model ◽

Learning Agent ◽

Hybrid Reinforcement ◽

Policy Optimization

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

Download Full-text

Data-Driven Online Energy Scheduling of a Microgrid Based on Deep Reinforcement Learning

Energies ◽

10.3390/en14082120 ◽

2021 ◽

Vol 14 (8) ◽

pp. 2120

Author(s):

Ying Ji ◽

Jianhui Wang ◽

Jiacan Xu ◽

Donglin Li

Keyword(s):

Reinforcement Learning ◽

Operating Cost ◽

Online Scheduling ◽

Optimal Scheduling ◽

Data Driven ◽

High Dimensional ◽

Continuous Control ◽

Renewable Energy Resources ◽

Continuous Actions ◽

Policy Optimization

The proliferation of distributed renewable energy resources (RESs) poses major challenges to the operation of microgrids due to uncertainty. Traditional online scheduling approaches relying on accurate forecasts become difficult to implement due to the increase of uncertain RESs. Although several data-driven methods have been proposed recently to overcome the challenge, they generally suffer from a scalability issue due to the limited ability to optimize high-dimensional continuous control variables. To address these issues, we propose a data-driven online scheduling method for microgrid energy optimization based on continuous-control deep reinforcement learning (DRL). We formulate the online scheduling problem as a Markov decision process (MDP). The objective is to minimize the operating cost of the microgrid considering the uncertainty of RESs generation, load demand, and electricity prices. To learn the optimal scheduling strategy, a Gated Recurrent Unit (GRU)-based network is designed to extract temporal features of uncertainty and generate the optimal scheduling decisions in an end-to-end manner. To optimize the policy with high-dimensional and continuous actions, proximal policy optimization (PPO) is employed to train the neural network-based policy in a data-driven fashion. The proposed method does not require any forecasting information on the uncertainty or a prior knowledge of the physical model of the microgrid. Simulation results using realistic power system data of California Independent System Operator (CAISO) demonstrate the effectiveness of the proposed method.

Download Full-text

Machine Learning Combinatorial Frameworks for Architecture

International Journal of Innovation and Economic Development ◽

10.18775/ijied.1849-7551-7020.2015.72.2002 ◽

2021 ◽

Vol 7 (2) ◽

pp. 20-29

Author(s):

Joshua Lye ◽

Alisa Andrasek

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Imitation Learning ◽

Learning Models ◽

Learning Agent ◽

Policy Optimization ◽

Machine Learning Models ◽

Reinforcement Learning Algorithm

This paper investigates the application of machine learning for the simulation of larger architectural aggregations formed through the recombination of discrete components. This is primarily explored through establishing hardcoded assembly and connection logics which are used to form the framework of architectural fitness conditions for machine learning models. The key machine learning models researched are a combination of the deep reinforcement learning algorithm proximal policy optimization (PPO) and Generative Adversarial Imitation Learning (GAIL) in the Unity Machine Learning Agent asset toolkit. The goal of applying these machine learning models is to train the agent behaviours (discrete components) to learn specific logics of connection. In order to achieve assembled architectural `states that allow for spatial habitation through the process of simulation.

Download Full-text

Average-Reward Reinforcement Learning with Trust Region Methods

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/385 ◽

2021 ◽

Author(s):

Xiaoteng Ma ◽

Xiaohang Tang ◽

Li Xia ◽

Jun Yang ◽

Qianchuan Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Continuous Control ◽

Performance Bound ◽

Average Value ◽

Long Run ◽

Average Criterion ◽

Region Theory ◽

Policy Optimization ◽

Discounted Criterion

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

Download Full-text

Study on Reinforcement Learning-Based Missile Guidance Law

Applied Sciences ◽

10.3390/app10186567 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6567

Author(s):

Daseon Hong ◽

Minjeong Kim ◽

Sungsu Park

Keyword(s):

Reinforcement Learning ◽

Optimization Problems ◽

Performance Comparison ◽

Training Methods ◽

Missile Guidance ◽

Proportional Navigation ◽

Guidance Law ◽

The Neural Network ◽

Comparison Results ◽

Proportional Navigation Guidance

Reinforcement learning is generating considerable interest in terms of building guidance law and solving optimization problems that were previously difficult to solve. Since reinforcement learning-based guidance laws often show better robustness than a previously optimized algorithm, several studies have been carried out on the subject. This paper presents a new approach to training missile guidance law by reinforcement learning and introducing some notable characteristics. The novel missile guidance law shows better robustness to the controller-model compared to the proportional navigation guidance. The neural network in this paper has identical inputs with proportional navigation guidance, which makes the comparison fair, distinguishing it from other research. The proposed guidance law will be compared to the proportional navigation guidance, which is widely known as quasi-optimal of missile guidance law. Our work aims to find effective missile training methods through reinforcement learning, and how better the new method is. Additionally, with the derived policy, we contemplated which is better, and in which circumstances it is better. A novel methodology for the training will be proposed first, and the performance comparison results will be continued therefrom.

Download Full-text

Tensor Based Knowledge Transfer Across Skill Categories for Robot Control

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/484 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chenyang Zhao ◽

Timothy M. Hospedales ◽

Freek Stulp ◽

Olivier Sigaud

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Robot Control ◽

Dynamic Control ◽

Imitation Learning ◽

The Neural Network ◽

Task Transfer ◽

Neural Network Controllers

Advances in hardware and learning for control are enabling robots to perform increasingly dextrous and dynamic control tasks. These skills typically require a prohibitive amount of exploration for reinforcement learning, and so are commonly achieved by imitation learning from manual demonstration. The costly non-scalable nature of manual demonstration has motivated work into skill generalisation, e.g., through contextual policies and options. Despite good results, existing work along these lines is limited to generalising across variants of one skill such as throwing an object to different locations. In this paper we go significantly further and investigate generalisation across qualitatively different classes of control skills. In particular, we introduce a class of neural network controllers that can realise four distinct skill classes: reaching, object throwing, casting, and ball-in-cup. By factorising the weights of the neural network, we are able to extract transferrable latent skills, that enable dramatic acceleration of learning in cross-task transfer. With a suitable curriculum, this allows us to learn challenging dextrous control tasks like ball-in-cup from scratch with pure reinforcement learning.

Download Full-text