Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

SummaryAutonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.

Download Full-text

Deep Reinforcement Learning for End-to-End Local Motion Planning of Autonomous Aerial Robots in Unknown Outdoor Environments: Real-Time Flight Experiments

Sensors ◽

10.3390/s21072534 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2534

Author(s):

Oualid Doukhi ◽

Deok-Jin Lee

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Autonomous Navigation ◽

Control Technique ◽

Local Motion ◽

Aerial Robots ◽

Novel Approach ◽

Outdoor Environments ◽

Aerial Vehicle ◽

High Level

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.

Download Full-text

Vision-Based Autonomous Landing of a Multi-Copter Unmanned Aerial Vehicle using Reinforcement Learning

2018 International Conference on Unmanned Aircraft Systems (ICUAS) ◽

10.1109/icuas.2018.8453315 ◽

2018 ◽

Cited By ~ 8

Author(s):

Seongheon Lee ◽

Taemin Shim ◽

Sungjoong Kim ◽

Junwoo Park ◽

Kyungwoo Hong ◽

...

Keyword(s):

Reinforcement Learning ◽

Unmanned Aerial Vehicle ◽

Autonomous Landing ◽

Aerial Vehicle

Download Full-text

Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Robotics ◽

10.3390/robotics9010008 ◽

2020 ◽

Vol 9 (1) ◽

pp. 8 ◽

Cited By ~ 2

Author(s):

Riccardo Polvara ◽

Massimiliano Patacchiola ◽

Marc Hanheide ◽

Gerhard Neumann

Keyword(s):

State Of The Art ◽

Control Policy ◽

Divide And Conquer ◽

Level Control ◽

Autonomous Landing ◽

Aerial Vehicle ◽

Noisy Conditions ◽

High Level ◽

Technical Solutions ◽

First Time

The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions.

Download Full-text

Adaptive fault-tolerant tracking control of flying-wing unmanned aerial vehicle with system input saturation and state constraints

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211027037 ◽

2021 ◽

pp. 014233122110270

Author(s):

Zhen Li ◽

Xin Chen ◽

Mingyang Xie ◽

Zhenhua Zhao

Keyword(s):

Reinforcement Learning ◽

Unmanned Aerial Vehicle ◽

Fault Tolerant ◽

Sliding Mode ◽

Input Saturation ◽

Control Technique ◽

Attitude Tracking ◽

Aerial Vehicle ◽

Fast Dynamic ◽

Dynamic Subsystem

In this paper, an adaptive fault-tolerant attitude tracking controller based on reinforcement learning is developed for flying-wing unmanned aerial vehicle subjected to actuator faults and saturation. At first, the attitude dynamic model is separated into two dynamic subsystems as slow and fast dynamic subsystems based on the principle of time scale separation. Secondly, backstepping technique is adopted to design the controller. For the purpose of attitude angle constraints, the control technique based on Barrier Lyapunov is used to design controller of slow dynamic subsystem. Considering the optimization of the fast dynamic subsystem, this paper introduces an adaptive reinforcement learning control method in which neural network is used to approximate the long-term performance index and lumped fault dynamic. It is shown that this control algorithm can satisfy the requirements of attitude tracking subjected to the control constraints and the stability of the system is proved from Lyapunov stability theory. The simulation results demonstrate that the developed fault-tolerant scheme is useful and has more smooth control effect compared with fault-tolerant controller based on sliding mode theory.

Download Full-text

Policy based reinforcement learning approach Of Jobshop scheduling with high level deadlock detection

10.31274/etd-180810-1488 ◽

2014 ◽

Author(s):

Mengmeng Chen

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Deadlock Detection ◽

Jobshop Scheduling ◽

High Level

Download Full-text

Route Optimization of Unmanned Aerial Vehicle by using Reinforcement Learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/1921/1/012076 ◽

2021 ◽

Vol 1921 ◽

pp. 012076

Author(s):

M Kundu ◽

D J Nagendra Kumar

Keyword(s):

Reinforcement Learning ◽

Unmanned Aerial Vehicle ◽

Route Optimization ◽

Aerial Vehicle

Download Full-text

Fuzzy Logic for Intelligent Control System Using Soft Computing Applications

Sensors ◽

10.3390/s21082617 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2617

Author(s):

Catalin Dumitrescu ◽

Petrica Ciotirnae ◽

Constantin Vizitiu

Keyword(s):

Decision Making ◽

Fuzzy Logic ◽

Soft Computing ◽

Intelligent Control ◽

Fuzzy Controller ◽

Control Technique ◽

Distributed Intelligence ◽

High Level ◽

The Impact ◽

Fuzzy Subsets

When considering the concept of distributed intelligent control, three types of components can be defined: (i) fuzzy sensors which provide a representation of measurements as fuzzy subsets, (ii) fuzzy actuators which can operate in the real world based on the fuzzy subsets they receive, and, (iii) the fuzzy components of the inference. As a result, these elements generate new fuzzy subsets from the fuzzy elements that were previously used. The purpose of this article is to define the elements of an interoperable technology Fuzzy Applied Cell Control-soft computing language for the development of fuzzy components with distributed intelligence implemented on the DSP target. The cells in the network are configured using the operations of symbolic fusion, symbolic inference and fuzzy–real symbolic transformation, which are based on the concepts of fuzzy meaning and fuzzy description. The two applications presented in the article, Agent-based modeling and fuzzy logic for simulating pedestrian crowds in panic decision-making situations and Fuzzy controller for mobile robot, are both timely. The increasing occurrence of panic moments during mass events prompted the investigation of the impact of panic on crowd dynamics and the simulation of pedestrian flows in panic situations. Based on the research presented in the article, we propose a Fuzzy controller-based system for determining pedestrian flows and calculating the shortest evacuation distance in panic situations. Fuzzy logic, one of the representation techniques in artificial intelligence, is a well-known method in soft computing that allows the treatment of strong constraints caused by the inaccuracy of the data obtained from the robot’s sensors. Based on this motivation, the second application proposed in the article creates an intelligent control technique based on Fuzzy Logic Control (FLC), a feature of intelligent control systems that can be used as an alternative to traditional control techniques for mobile robots. This method allows you to simulate the experience of a human expert. The benefits of using a network of fuzzy components are not limited to those provided distributed systems. Fuzzy cells are simple to configure while also providing high-level functions such as mergers and decision-making processes.

Download Full-text

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Applied Sciences ◽

10.3390/app11031291 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1291

Author(s):

Bonwoo Gu ◽

Yunsick Sung

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Classification Criteria ◽

Tree Search ◽

Learning Method ◽

Board Game ◽

Ancient China ◽

Monte Carlo Tree Search ◽

High Level ◽

Tree Search Algorithm

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Download Full-text

Adaptive dynamic programming and deep reinforcement learning for the control of an unmanned surface vehicle: Experimental results

Control Engineering Practice ◽

10.1016/j.conengprac.2021.104807 ◽

2021 ◽

Vol 111 ◽

pp. 104807

Author(s):

Alejandro Gonzalez-Garcia ◽

David Barragan-Alcantar ◽

Ivana Collado-Gonzalez ◽

Leonardo Garrido

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Adaptive Dynamic Programming ◽

Experimental Results ◽

Unmanned Surface Vehicle ◽

Adaptive Dynamic

Download Full-text

Transfer Reinforcement Learning for Autonomous Driving

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3449356 ◽

2021 ◽

Vol 31 (3) ◽

pp. 1-26

Author(s):

Aravind Balakrishnan ◽

Jaeyoung Lee ◽

Ashish Gaurav ◽

Krzysztof Czarnecki ◽

Sean Sedwards

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Transfer Problem ◽

Autonomous Driving ◽

High Fidelity ◽

Rule Based ◽

High Level ◽

Real Vehicle

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.

Download Full-text