scholarly journals A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5643
Author(s):  
Wenqiang Zu ◽  
Hongyu Yang ◽  
Renyu Liu ◽  
Yulong Ji

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. [d=Zu]In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach.To enhance the performance, in the present study, a multi-layer RL approach to solve the multi-dimensional goal aircraft guidance problem is proposed. The approach [d=Zu]enablesassists the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL [d=Zu]approach is proposedmethod to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. [d=Zu]Experimental and simulation results show that the proposed approachExperiments are conducted and the simulation results reveal that the proposed method can significantly improve the convergence efficiency and trajectory performance. [d=Zu]FurthermoreFurther, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.

2021 ◽  
Vol 18 (1) ◽  
pp. 172988142198954
Author(s):  
Zhuang Wang ◽  
Hui Li ◽  
Zhaoxin Wu ◽  
Haolin Wu

To enhance the performance of guiding an aircraft to a moving destination in a certain direction in three-dimensional continuous space, it is essential to develop an efficient intelligent algorithm. In this article, a pretrained proximal policy optimization (PPO) with reward shaping algorithm, which does not require an accurate model, is proposed to solve the guidance problem of manned aircraft and unmanned aerial vehicles. Continuous action reward function and position reward function are presented, by which the training speed is increased and the performance of the generated trajectory is improved. Using pretrained PPO, a new agent can be trained efficiently for a new task. A reinforcement learning framework is built, in which an agent can be trained to generate a reference trajectory or a series of guidance instructions. General simulation results show that the proposed method can significantly improve the training efficiency and trajectory performance. The carrier-based aircraft approach simulation is carried out to prove the application value of the proposed approach.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3664 ◽  
Author(s):  
Qichen Zhang ◽  
Meiqiang Zhu ◽  
Liang Zou ◽  
Ming Li ◽  
Yong Zhang

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.


2013 ◽  
Vol 860-863 ◽  
pp. 2791-2795
Author(s):  
Qian Xiao ◽  
Yu Shan Jiang ◽  
Ru Zheng Cui

Aiming at the large calculation workload of adaptive algorithm in adaptive filter based on wavelet transform, affecting the filtering speed, a wavelet-based neural network adaptive filter is constructed in this paper. Since the neural network has the ability of distributed storage and fast self-evolution, use Hopfield neural network to implement adaptive filter LMS algorithm in this filter so as to improve the speed of operation. The simulation results prove that, the new filter can achieve rapid real-time denoising.


Author(s):  
Raheleh Jafari ◽  
Sina Razvarz ◽  
Alexander Gegov ◽  
Satyam Paul

In order to model the fuzzy nonlinear systems, fuzzy equations with Z-number coefficients are used in this chapter. The modeling of fuzzy nonlinear systems is to obtain the Z-number coefficients of fuzzy equations. In this work, the neural network approach is used for finding the coefficients of fuzzy equations. Some examples with applications in mechanics are given. The simulation results demonstrate that the proposed neural network is effective for obtaining the Z-number coefficients of fuzzy equations.


2013 ◽  
Vol 579-580 ◽  
pp. 804-807
Author(s):  
Zhong Yao Wu ◽  
Tian Feng Zhao ◽  
Jian Bo Cao ◽  
Shi Ju E ◽  
Chun Xiao Chen

Dielectric elastomer is a kind of electroactive polymer material with optimal performance. As actuator material, dielectric elastomer has shown a good prospect. Based on studying the principle of electroactive polymer, a new type of cylindrical actuator was designed. Its 3-D figure and 2-D dimension drawing was finished by UG software. The animation simulation of the actuator was studied. The simulation results verified the feasibility of design scheme. Electroactive polymer will have broad application prospects in the field of actuator.


2015 ◽  
Vol 740 ◽  
pp. 871-874
Author(s):  
Hui Zhao ◽  
Li Rong Shi ◽  
Hong Jun Wang

Directing against the problems of too large size of the neural network structure due to the existence of a complex relationship between the input coupling factor and too many input factors in establishing model for predicting temperature of sunlight greenhouse. This article chose the environmental factors that affect the sunlight greenhouse temperature as data sample. Through the principal component analysis of data samples, three main factors were extracted. These selected principal component values were taken as the input variables of BP neural network model. Use the Bayesian regularization algorithm to improve the BP neural network. The empirical results show that this method is utilized modify BP neural network, which can simplify network structure and smooth fitting curve, has good generalization capability.


2008 ◽  
Vol 20 (2) ◽  
pp. 415-435 ◽  
Author(s):  
Ryosuke Hosaka ◽  
Osamu Araki ◽  
Tohru Ikeguchi

Spike-timing-dependent synaptic plasticity (STDP), which depends on the temporal difference between pre- and postsynaptic action potentials, is observed in the cortices and hippocampus. Although several theoretical and experimental studies have revealed its fundamental aspects, its functional role remains unclear. To examine how an input spatiotemporal spike pattern is altered by STDP, we observed the output spike patterns of a spiking neural network model with an asymmetrical STDP rule when the input spatiotemporal pattern is repeatedly applied. The spiking neural network comprises excitatory and inhibitory neurons that exhibit local interactions. Numerical experiments show that the spiking neural network generates a single global synchrony whose relative timing depends on the input spatiotemporal pattern and the neural network structure. This result implies that the spiking neural network learns the transformation from spatiotemporal to temporal information. In the literature, the origin of the synfire chain has not been sufficiently focused on. Our results indicate that spiking neural networks with STDP can ignite synfire chains in the cortices.


2011 ◽  
Vol 55-57 ◽  
pp. 407-412 ◽  
Author(s):  
Ye Yuan ◽  
Zhong Kai Yang ◽  
Qing Fu Li

This paper focuses on the end effect problem of the empirical mode decomposition (EMD) algorithm, which results in a serious distortion in the EMD sifting process. A new method based on fuzzy inductive reasoning (FIR) is proposed to overcome the end effect. Fuzzy inductive reasoning method has simple inferring rules and strong predictive capability. The fuzzy inductive reasoning based method uses the sequence near the end as the input signal of fuzzy inductive reasoning model. This predictive value can be obtained after fuzzification, qualitative modeling ,qualitative simulation and debluring. The simulation results have shown that the fuzzy inductive reasoning based method has equivalent performance to the neural network based method.


Author(s):  
Rached Dhaouadi ◽  
◽  
Khaled Nouri

We present an application of artificial neural networks to the problem of controlling the speed of an elastic drive system. We derive a neural network structure to simulate the inverse dynamics of the system, then implement the direct inverse control scheme in a closed loop. The neural network learning is done on-line to adaptively control the speed to follow a stepwise changing reference. The experimental results with a two-mass-model analog board confirm the effectiveness of the proposed neurocontrol scheme.


2014 ◽  
Vol 937 ◽  
pp. 308-312
Author(s):  
Xi Hua Du ◽  
Xiao Hui Wang

Based on the molecular topology information and adjacency matrix, the 38 electrical state indices of molecules of inhibitor of thymidylic acid-based synthetase as five-membered heterocyclic pyrimidine derivatives were calculated to provide theoretical basis for molecular design of new drugs. By using variable regression method, the best subset of structural parameters ofE1,E2,E7,E16andE31were optimized. When the five structural parameters were used as the BP neural network input neurons and the neural network structure of 5:3:1 was used, an ideal prediction model of biological activity was obtained. Its total correlation coefficientrand average relative error were 0.972 and 2.13%, respectively. The result showed that the biological activity andE1,E2,E7,E16andE31have a good non-linear relationship with the biological activity, and the results predicted by neural networks was better than that by multiple regression method. The test proved that the model had good robust and predictive capabilities. Our research would provide theoretical guidance for the development of new drugs of inhibitor of thymidylic acid-based synthetase with efficient and low toxicity.


Sign in / Sign up

Export Citation Format

Share Document