scholarly journals Multi-UAV Navigation for Partially Observable Communication Coverage by Graph Reinforcement Learning

Author(s):  
Zhenhui Ye

<div>In this paper, we aim to design a deep reinforcement learning(DRL) based control solution to navigate a swarm of unmanned aerial vehicles (UAVs) to fly around an unexplored target area under provide optimal communication coverage for the ground mobile users. Compared with existing DRL-based solutions that mainly solve the problem with global observation and centralized training, a practical and efficient Decentralized Training and Decentralized Execution(DTDE) framework is desirable to train and deploy each UAV in a distributed manner. To this end, we propose a novel DRL approach named Deep Recurrent Graph Network(DRGN) that makes use of Graph Attention Network-based Flying Ad-hoc Network(GAT-FANET) to achieve inter-UAV communications and Gated Recurrent Unit (GRU) to record historical information. We conducted extensive experiments to define an appropriate structure for GAT-FANET and examine the performance of DRGN. The simulation results show that the proposed model outperforms four state-of-the-art DRL-based approaches and four heuristic baselines, and demonstrate the scalability, transferability, robustness, and interpretability of DRGN.</div>

2021 ◽  
Author(s):  
Zhenhui Ye

<div>In this paper, we aim to design a deep reinforcement learning(DRL) based control solution to navigate a swarm of unmanned aerial vehicles (UAVs) to fly around an unexplored target area under provide optimal communication coverage for the ground mobile users. Compared with existing DRL-based solutions that mainly solve the problem with global observation and centralized training, a practical and efficient Decentralized Training and Decentralized Execution(DTDE) framework is desirable to train and deploy each UAV in a distributed manner. To this end, we propose a novel DRL approach named Deep Recurrent Graph Network(DRGN) that makes use of Graph Attention Network-based Flying Ad-hoc Network(GAT-FANET) to achieve inter-UAV communications and Gated Recurrent Unit (GRU) to record historical information. We conducted extensive experiments to define an appropriate structure for GAT-FANET and examine the performance of DRGN. The simulation results show that the proposed model outperforms four state-of-the-art DRL-based approaches and four heuristic baselines, and demonstrate the scalability, transferability, robustness, and interpretability of DRGN.</div>


Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 999
Author(s):  
Ahmad Taher Azar ◽  
Anis Koubaa ◽  
Nada Ali Mohamed ◽  
Habiba A. Ibrahim ◽  
Zahra Fathy Ibrahim ◽  
...  

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.


Author(s):  
Rahul Desai ◽  
B P Patil

<p class="Abstract">This paper describes and evaluates the performance of various reinforcement learning algorithms with shortest path algorithms that are widely used for routing packets through the network. Shortest path routing is the simplest policy used for routing the packets along the path having minimum number of hops. In high traffic or high mobility conditions, the shortest path get flooded with huge number of packets and congestions occurs, So such shortest path does not provides the shortest path and increases delay for reaching the packets to the destination. Reinforcement learning algorithms are adaptive algorithms where the path is selected based on the traffic present on the network at real time. Thus they guarantee the least delivery time to reach the packets to the destination. Analysis done on a 6 by 6 irregular grid and sample ad hoc network shows that performance parameters used for judging the network - packet delivery ratio and delay provides optimum results using reinforcement learning algorithms. </p>


Author(s):  
John Aslanides ◽  
Jan Leike ◽  
Marcus Hutter

Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open- source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.


2021 ◽  
Vol 15 ◽  
Author(s):  
Philipp Weidel ◽  
Renato Duarte ◽  
Abigail Morrison

Reinforcement learning is a paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. To partition an environment into discrete states, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields specified ad hoc by the researcher. This is problematic as a model for how an organism can learn appropriate behavioral sequences in unknown environments, as it fails to account for the unsupervised and self-organized nature of the required representations. Additionally, this approach presupposes knowledge on the part of the researcher on how the environment should be partitioned and represented and scales poorly with the size or complexity of the environment. To address these issues and gain insights into how the brain generates its own task-relevant mappings, we propose a learning architecture that combines unsupervised learning on the input projections with biologically motivated clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce clearly distinguishable activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.


Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


Author(s):  
Rahul Desai ◽  
B.P. Patil

This paper describes and evaluates the performance of various reinforcement learning algorithms with shortest path algorithms that are widely used for routing packets throughout the network. Shortest path routing is simplest policy used for routing the packets along the path having minimum number of hops. In high traffic or high mobility conditions, the shortest path gets flooded with huge number of packets and congestions occurs, so such shortest path does not provide the shortest path and increases delay for reaching the packets to the destination. Reinforcement learning algorithms are adaptive algorithms where the path is selected based on the traffic present on the network at real time. Thus they guarantee the least delivery time to reach the packets to the destination. Analysis is done on a 6-by-6 irregular grid and sample ad hoc network shows that performance parameters used for judging the network such as packet delivery ratio and delay provide optimum results using reinforcement learning algorithms.


Author(s):  
Srilakshmi R. ◽  
Jaya Bhaskar M.

Mobile ad-hoc network (MANET) is a trending field in the smart digital world; it is effectively utilized for communication sharing purposes. Besides this communication, it has numerous advances like a personal computer. However, the packet drop and low throughput ratio became serious issues. Several algorithms are implemented to increase the throughput ratio by developing multipath routing. But in some cases, the multipath routing ends in routing overhead and takes more time to transfer the data because of data load in the same path. To end this problem, this research aimed to develop a novel temporary ordered route energy migration (TOREM). Here, the migration approach balanced the data load equally and enhanced the communication channel; also, the reference node creation strategy reduced the routing overhead and packet drop ratio. Finally, the outcome of the proposed model is validated with recent existing works and earned better results by minimizing packet drop and maximizing throughput ratio.


Sign in / Sign up

Export Citation Format

Share Document