scholarly journals A pretrained proximal policy optimization algorithm with reward shaping for aircraft guidance to a moving destination in three-dimensional continuous space

2021 ◽  
Vol 18 (1) ◽  
pp. 172988142198954
Author(s):  
Zhuang Wang ◽  
Hui Li ◽  
Zhaoxin Wu ◽  
Haolin Wu

To enhance the performance of guiding an aircraft to a moving destination in a certain direction in three-dimensional continuous space, it is essential to develop an efficient intelligent algorithm. In this article, a pretrained proximal policy optimization (PPO) with reward shaping algorithm, which does not require an accurate model, is proposed to solve the guidance problem of manned aircraft and unmanned aerial vehicles. Continuous action reward function and position reward function are presented, by which the training speed is increased and the performance of the generated trajectory is improved. Using pretrained PPO, a new agent can be trained efficiently for a new task. A reinforcement learning framework is built, in which an agent can be trained to generate a reference trajectory or a series of guidance instructions. General simulation results show that the proposed method can significantly improve the training efficiency and trajectory performance. The carrier-based aircraft approach simulation is carried out to prove the application value of the proposed approach.

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5643
Author(s):  
Wenqiang Zu ◽  
Hongyu Yang ◽  
Renyu Liu ◽  
Yulong Ji

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. [d=Zu]In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach.To enhance the performance, in the present study, a multi-layer RL approach to solve the multi-dimensional goal aircraft guidance problem is proposed. The approach [d=Zu]enablesassists the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL [d=Zu]approach is proposedmethod to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. [d=Zu]Experimental and simulation results show that the proposed approachExperiments are conducted and the simulation results reveal that the proposed method can significantly improve the convergence efficiency and trajectory performance. [d=Zu]FurthermoreFurther, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.


Logistics ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 10
Author(s):  
Rachid Oucheikh ◽  
Tuwe Löfström ◽  
Ernst Ahlberg ◽  
Lars Carlsson

Loading and unloading rolling cargo in roll-on/roll-off are important and very recurrent operations in maritime logistics. In this paper, we apply state-of-the-art deep reinforcement learning algorithms to automate these operations in a complex and real environment. The objective is to teach an autonomous tug master to manage rolling cargo and perform loading and unloading operations while avoiding collisions with static and dynamic obstacles along the way. The artificial intelligence agent, representing the tug master, is trained and evaluated in a challenging environment based on the Unity3D learning framework, called the ML-Agents, and using proximal policy optimization. The agent is equipped with sensors for obstacle detection and is provided with real-time feedback from the environment thanks to its own reward function, allowing it to dynamically adapt its policies and navigation strategy. The performance evaluation shows that by choosing appropriate hyperparameters, the agents can successfully learn all required operations including lane-following, obstacle avoidance, and rolling cargo placement. This study also demonstrates the potential of intelligent autonomous systems to improve the performance and service quality of maritime transport.


2021 ◽  
pp. 1-10
Author(s):  
Wei Zhou ◽  
Xing Jiang ◽  
Bingli Guo (Member, IEEE) ◽  
Lingyu Meng

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).


2019 ◽  
Vol 1 ◽  
pp. 1-2
Author(s):  
Jiafeng Shi ◽  
Jie Shen ◽  
Zdeněk Stachoň ◽  
Yawei Chen

<p><strong>Abstract.</strong> With the increasing number of large buildings and more frequent indoor activities, indoor location-based service has expanded. Due to the complicated internal passages of large public buildings and the three-dimensional interlacing, it is difficult for users to quickly reach the destination, the demand of indoor paths visualization increases. Isikdag (2013), Zhang Shaoping (2017), Huang Kejia (2018) provided navigation services for users based on path planning algorithm. In terms of indoor path visualization, Nossum (2011) proposed a “Tubes” map design method, which superimposed the channel information of different floors on the same plane by simplifying the indoor corridor and the room. Lorenz et al (2013) focused on map perspective (2D/3D) and landmarks, developed and investigated cartographic methods for effective route guidance in indoor environments. Holscher et al (2007) emphasized using the landmark objects at the important decision points of the route in indoor map design. The existing studies mainly focused on two-dimensional plane to visualize the indoor path, lacking the analysis of three-dimensional connectivity in indoor space, which makes the intuitiveness and interactivity of path visualization greatly compromised. Therefore, it is difficult to satisfy the wayfinding requirements of the indoor multi-layer continuous space. In order to solve this problem, this paper aims to study the characteristics of the indoor environment and propose a path visualization method. The following questions are addressed in this study: 1) What are the key characteristics of the indoor environment compared to the outdoor space? 2) How to visualize the indoor paths to satisfy the users’ wayfinding needs?</p>


2014 ◽  
Vol 38 (4) ◽  
pp. 1091-1100
Author(s):  
Tales de Campos Piedade ◽  
Vander de Freitas Melo ◽  
Luiz Cláudio de Paula Souza ◽  
Jeferson Dieckow

The graphical representation of spatial soil properties in a digital environment is complex because it requires a conversion of data collected in a discrete form onto a continuous surface. The objective of this study was to apply three-dimension techniques of interpolation and visualization on soil texture and fertility properties and establish relationships with pedogenetic factors and processes in a slope area. The GRASS Geographic Information System was used to generate three-dimensional models and ParaView software to visualize soil volumes. Samples of the A, AB, BA, and B horizons were collected in a regular 122-point grid in an area of 13 ha, in Pinhais, PR, in southern Brazil. Geoprocessing and graphic computing techniques were effective in identifying and delimiting soil volumes of distinct ranges of fertility properties confined within the soil matrix. Both three-dimensional interpolation and the visualization tool facilitated interpretation in a continuous space (volumes) of the cause-effect relationships between soil texture and fertility properties and pedological factors and processes, such as higher clay contents following the drainage lines of the area. The flattest part with more weathered soils (Oxisols) had the highest pH values and lower Al3+ concentrations. These techniques of data interpolation and visualization have great potential for use in diverse areas of soil science, such as identification of soil volumes occurring side-by-side but that exhibit different physical, chemical, and mineralogical conditions for plant root growth, and monitoring of plumes of organic and inorganic pollutants in soils and sediments, among other applications. The methodological details for interpolation and a three-dimensional view of soil data are presented here.


2020 ◽  
Vol 12 (12) ◽  
pp. 1964 ◽  
Author(s):  
Mengbin Rao ◽  
Ping Tang ◽  
Zheng Zhang

Since hyperspectral images (HSI) captured by different sensors often contain different number of bands, but most of the convolutional neural networks (CNN) require a fixed-size input, the generalization capability of deep CNNs to use heterogeneous input to achieve better classification performance has become a research focus. For classification tasks with limited labeled samples, the training strategy of feeding CNNs with sample-pairs instead of single sample has proven to be an efficient approach. Following this strategy, we propose a Siamese CNN with three-dimensional (3D) adaptive spatial-spectral pyramid pooling (ASSP) layer, called ASSP-SCNN, that takes as input 3D sample-pair with varying size and can easily be transferred to another HSI dataset regardless of the number of spectral bands. The 3D ASSP layer can also extract different levels of 3D information to improve the classification performance of the equipped CNN. To evaluate the classification and generalization performance of ASSP-SCNN, our experiments consist of two parts: the experiments of ASSP-SCNN without pre-training and the experiments of ASSP-SCNN-based transfer learning framework. Experimental results on three HSI datasets demonstrate that both ASSP-SCNN without pre-training and transfer learning based on ASSP-SCNN achieve higher classification accuracies than several state-of-the-art CNN-based methods. Moreover, we also compare the performance of ASSP-SCNN on different transfer learning tasks, which further verifies that ASSP-SCNN has a strong generalization capability.


Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 674
Author(s):  
Boris Belousov ◽  
Jan Peters

An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment. Such interactive data gathering commonly leads to divergence towards dangerous or uninformative regions of the state space unless additional regularization measures are taken. Prior works proposed bounding the information loss measured by the Kullback–Leibler (KL) divergence at every policy improvement step to eliminate instability in the learning dynamics. In this paper, we consider a broader family of f-divergences, and more concretely α -divergences, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures. In particular, common least-squares value function estimation coupled with advantage-weighted maximum likelihood policy improvement is shown to correspond to the Pearson χ 2 -divergence penalty. Other actor-critic pairs arise for various choices of the penalty-generating function f. On a concrete instantiation of our framework with the α -divergence, we carry out asymptotic analysis of the solutions for different values of α and demonstrate the effects of the divergence function choice on common standard reinforcement learning problems.


Author(s):  
Pei-Hua Huang ◽  
◽  
Osamu Hasegawa

This study presents an aerial robotic application of deep reinforcement learning that imparts an asynchronous learning framework and trust region policy optimization to a simulated quad-rotor helicopter (quadcopter) environment. In particular, we optimized a control policy asynchronously through interaction with concurrent instances of the environment. The control system was benchmarked and extended with examples to tackle continuous state-action tasks for the quadcoptor: hovering control and balancing an inverted pole. Performing these maneuvers required continuous actions for sensitive control of small acceleration changes of the quadcoptor, thereby maximizing the scalar reward of the defined tasks. The simulation results demonstrated an enhancement of the learning speed and reliability for the tasks.


Sign in / Sign up

Export Citation Format

Share Document