Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation

Machine Learning ◽

10.1007/s10994-021-06006-6 ◽

2021 ◽

Author(s):

Srivatsan Krishnan ◽

Behzad Boroujerdian ◽

William Fu ◽

Aleksandra Faust ◽

Vijay Janapa Reddi

Keyword(s):

Reinforcement Learning ◽

Embedded System ◽

Broad Class ◽

Visual Navigation ◽

Raspberry Pi ◽

Latency Distribution ◽

Hardware In The Loop ◽

Resource Constrained ◽

Aerial Robot ◽

Policy Optimization

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.

Download Full-text

Visual Navigation with Asynchronous Proximal Policy Optimization in Artificial Agents

Journal of Robotics ◽

10.1155/2020/8702962 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Fanyu Zeng ◽

Chen Wang

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Gradient Methods ◽

Visual Navigation ◽

Experimental Results ◽

Artificial Agents ◽

Policy Gradient ◽

Policy Optimization ◽

Navigation Method ◽

Better Than

Vanilla policy gradient methods suffer from high variance, leading to unstable policies during training, where the policy’s performance fluctuates drastically between iterations. To address this issue, we analyze the policy optimization process of the navigation method based on deep reinforcement learning (DRL) that uses asynchronous gradient descent for optimization. A variant navigation (asynchronous proximal policy optimization navigation, appoNav) is presented that can guarantee the policy monotonic improvement during the process of policy optimization. Our experiments are tested in DeepMind Lab, and the experimental results show that the artificial agents with appoNav perform better than the compared algorithm.

Download Full-text

Visual Navigation With Multiple Goals Based on Deep Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3057424 ◽

2021 ◽

pp. 1-11

Author(s):

Zhenhuan Rao ◽

Yuechen Wu ◽

Zifei Yang ◽

Wei Zhang ◽

Shijian Lu ◽

...

Keyword(s):

Reinforcement Learning ◽

Visual Navigation ◽

Multiple Goals

Download Full-text

Use of SPARK in a resource constrained embedded system

ACM SIGAda Ada Letters ◽

10.1145/1653616.1647441 ◽

2009 ◽

Vol 29 (3) ◽

pp. 87-90

Author(s):

Chad Loseby ◽

Peter Chapin ◽

Carl Brandon

Keyword(s):

Embedded System ◽

Resource Constrained

Download Full-text

Reinforcement Learning for Dual-Resource Constrained Scheduling

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2866 ◽

2020 ◽

Vol 53 (2) ◽

pp. 10810-10815

Author(s):

Miguel S.E. Martins ◽

Joaquim L. Viegas ◽

Tiago Coito ◽

Bernardo Marreiros Firme ◽

João M.C. Sousa ◽

...

Keyword(s):

Reinforcement Learning ◽

Resource Constrained

Download Full-text

Fixed Wing Aircraft Automatic Landing with the Use of a Dedicated Ground Sign System

Aerospace ◽

10.3390/aerospace8060167 ◽

2021 ◽

Vol 8 (6) ◽

pp. 167

Author(s):

Bartłomiej Brukarczyk ◽

Dariusz Nowak ◽

Piotr Kot ◽

Tomasz Rogalski ◽

Paweł Rzucidło

Keyword(s):

Embedded System ◽

Final Section ◽

Vision System ◽

Video Camera ◽

Hardware In The Loop ◽

Glide Path ◽

Automatic Landing ◽

Input Signals ◽

Fuzzy Logic Expert System ◽

Longitudinal Channel

The paper presents automatic control of an aircraft in the longitudinal channel during automatic landing. There are two crucial components of the system presented in the paper: a vision system and an automatic landing system. The vision system processes pictures of dedicated on-ground signs which appear to an on-board video camera to determine a glide path. Image processing algorithms used by the system were implemented into an embedded system and tested under laboratory conditions according to the hardware-in-the-loop method. An output from the vision system was used as one of the input signals to an automatic landing system. The major components are control algorithms based on the fuzzy logic expert system. They were created to imitate pilot actions while landing the aircraft. Both systems were connected with one another for cooperation and to control an aircraft model in a simulation environment. Selected results of tests presenting control efficiency and precision are shown in the final section of the paper.

Download Full-text

An Efficiency Enhancing Methodology for Multiple Autonomous Vehicles in an Urban Network Adopting Deep Reinforcement Learning

Applied Sciences ◽

10.3390/app11041514 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1514 ◽

Cited By ~ 2

Author(s):

Quang-Duy Tran ◽

Sang-Hoon Bae

Keyword(s):

Reinforcement Learning ◽

Traffic Congestion ◽

Autonomous Vehicles ◽

Penetration Rate ◽

Autonomous Vehicle ◽

Effective Means ◽

Urban Network ◽

Learning Agents ◽

Policy Optimization ◽

The Impact

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.

Download Full-text

Adaptive Client Selection in Resource Constrained Federated Learning Systems: A Deep Reinforcement Learning Approach

IEEE Access ◽

10.1109/access.2021.3095915 ◽

2021 ◽

pp. 1-1

Author(s):

Hangjia Zhang ◽

Zhijun Xie ◽

Roozbeh Zarei ◽

Tao Wu ◽

Kewei Chen

Keyword(s):

Reinforcement Learning ◽

Learning Systems ◽

Learning Approach ◽

Resource Constrained

Download Full-text

A Resource-Constrained and Privacy-Preserving Edge Computing Enabled Clinical Decision System: A Federated Reinforcement Learning Approach

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3057653 ◽

2021 ◽

pp. 1-1

Author(s):

Zeyue Xue ◽

Pan Zhou ◽

Zichuan Xu ◽

Xiumin Wang ◽

Yulai Xie ◽

...

Keyword(s):

Reinforcement Learning ◽

Clinical Decision ◽

Privacy Preserving ◽

Edge Computing ◽

Learning Approach ◽

Resource Constrained ◽

Decision System ◽

System A ◽

Clinical Decision System

Download Full-text

Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system

Journal of Intelligent Manufacturing ◽

10.1007/s10845-013-0864-5 ◽

2014 ◽

Vol 27 (2) ◽

pp. 325-333 ◽

Cited By ~ 21

Author(s):

Xiao Wang ◽

Hongwei Wang ◽

Chao Qi

Keyword(s):

Reinforcement Learning ◽

Flow Line ◽

Maintenance Policy ◽

Resource Constrained ◽

Line System ◽

Multi Agent ◽

Constrained Flow

Download Full-text

RANCANG BANGUN INTERNET OF THINGS SERVER DENGAN MENGGUNAKAN ACTIVEMQ ARTEMIS UNTUK MITIGASI BENCANA DI RASPBERRY PI 3

Transmisi ◽

10.14710/transmisi.22.4.117-122 ◽

2020 ◽

Vol 22 (4) ◽

pp. 117-122

Author(s):

Sadr Lufti Mufreni ◽

Esi Putri Silmina

Keyword(s):

Internet Of Things ◽

Embedded System ◽

Raspberry Pi

Indonesia merupakan negara kepulauan yang mempunyai lebih dari 13.000 pulau. Wilayahnya terletak di antara Samudera Hindia dan Samudera Pasifik dan dilewati oleh Pacific Ring of Fire sehingga banyak gunung berapi aktif. Berdasarkan letak geografis mempunyai potensi tsunami dan gempa bumi cukup tinggi. Diperlukan rencana penanggulangan bencana yang baik untuk menekan risiko yang bisa terjadi, salah satunya dengan mitigasi bencana. Mitigasi bencana adalah serangkaian upaya untuk mengurangi risiko bencana, baik melalui pembangunan fisik maupun penyadaran dan peningkatan kemampuan menghadapi ancaman bencana. Mitigasi bencana diperlukan untuk mengurangi dampak yang ditimbulkan terutama korban jiwa. Salah satunya dengan menggunakan sistem peringatan dini. Sistem peringatan dini terdiri dari 3 komponen utama yaitu sensor untuk mendapatkan nilai dari suatu lingkungan, controller untuk mengolah nilai yang diterima, dan aksi yang dilakukan berdasarkan hasil dari pengolahan. Untuk membuat sistem yang efektif diperlukan komunikasi yang memadai. Messaging queue digunakan oleh industri untuk komunikasi antar perangkat lunak, perangkat keras, dan embedded system. Penelitian berfokus pada penggunaan ActiveMQ Artemis sebagai messaging queue sebagai server untuk komunikasi dengan internet of things (IoT). Keunggulan ActiveMQ Artemis dapat dijalankan di Raspberry Pi 3 dengan sedikit modifikasi. Hasil penelitian membuktikan bahwa ActiveMQ Artemis dapat digunakan untuk komunikasi IoT pada simulasi sistem mitigasi bencana.

Download Full-text