scholarly journals Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Robotica ◽  
2019 ◽  
Vol 37 (11) ◽  
pp. 1867-1882 ◽  
Author(s):  
Riccardo Polvara ◽  
Sanjay Sharma ◽  
Jian Wan ◽  
Andrew Manning ◽  
Robert Sutton

SummaryAutonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2534
Author(s):  
Oualid Doukhi ◽  
Deok-Jin Lee

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.


Robotics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 8 ◽  
Author(s):  
Riccardo Polvara ◽  
Massimiliano Patacchiola ◽  
Marc Hanheide ◽  
Gerhard Neumann

The autonomous landing of an Unmanned Aerial Vehicle (UAV) on a marker is one of the most challenging problems in robotics. Many solutions have been proposed, with the best results achieved via customized geometric features and external sensors. This paper discusses for the first time the use of deep reinforcement learning as an end-to-end learning paradigm to find a policy for UAVs autonomous landing. Our method is based on a divide-and-conquer paradigm that splits a task into sequential sub-tasks, each one assigned to a Deep Q-Network (DQN), hence the name Sequential Deep Q-Network (SDQN). Each DQN in an SDQN is activated by an internal trigger, and it represents a component of a high-level control policy, which can navigate the UAV towards the marker. Different technical solutions have been implemented, for example combining vanilla and double DQNs, and the introduction of a partitioned buffer replay to address the problem of sample efficiency. One of the main contributions of this work consists in showing how an SDQN trained in a simulator via domain randomization, can effectively generalize to real-world scenarios of increasing complexity. The performance of SDQNs is comparable with a state-of-the-art algorithm and human pilots while being quantitatively better in noisy conditions.


Author(s):  
Zhen Li ◽  
Xin Chen ◽  
Mingyang Xie ◽  
Zhenhua Zhao

In this paper, an adaptive fault-tolerant attitude tracking controller based on reinforcement learning is developed for flying-wing unmanned aerial vehicle subjected to actuator faults and saturation. At first, the attitude dynamic model is separated into two dynamic subsystems as slow and fast dynamic subsystems based on the principle of time scale separation. Secondly, backstepping technique is adopted to design the controller. For the purpose of attitude angle constraints, the control technique based on Barrier Lyapunov is used to design controller of slow dynamic subsystem. Considering the optimization of the fast dynamic subsystem, this paper introduces an adaptive reinforcement learning control method in which neural network is used to approximate the long-term performance index and lumped fault dynamic. It is shown that this control algorithm can satisfy the requirements of attitude tracking subjected to the control constraints and the stability of the system is proved from Lyapunov stability theory. The simulation results demonstrate that the developed fault-tolerant scheme is useful and has more smooth control effect compared with fault-tolerant controller based on sliding mode theory.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2617
Author(s):  
Catalin Dumitrescu ◽  
Petrica Ciotirnae ◽  
Constantin Vizitiu

When considering the concept of distributed intelligent control, three types of components can be defined: (i) fuzzy sensors which provide a representation of measurements as fuzzy subsets, (ii) fuzzy actuators which can operate in the real world based on the fuzzy subsets they receive, and, (iii) the fuzzy components of the inference. As a result, these elements generate new fuzzy subsets from the fuzzy elements that were previously used. The purpose of this article is to define the elements of an interoperable technology Fuzzy Applied Cell Control-soft computing language for the development of fuzzy components with distributed intelligence implemented on the DSP target. The cells in the network are configured using the operations of symbolic fusion, symbolic inference and fuzzy–real symbolic transformation, which are based on the concepts of fuzzy meaning and fuzzy description. The two applications presented in the article, Agent-based modeling and fuzzy logic for simulating pedestrian crowds in panic decision-making situations and Fuzzy controller for mobile robot, are both timely. The increasing occurrence of panic moments during mass events prompted the investigation of the impact of panic on crowd dynamics and the simulation of pedestrian flows in panic situations. Based on the research presented in the article, we propose a Fuzzy controller-based system for determining pedestrian flows and calculating the shortest evacuation distance in panic situations. Fuzzy logic, one of the representation techniques in artificial intelligence, is a well-known method in soft computing that allows the treatment of strong constraints caused by the inaccuracy of the data obtained from the robot’s sensors. Based on this motivation, the second application proposed in the article creates an intelligent control technique based on Fuzzy Logic Control (FLC), a feature of intelligent control systems that can be used as an alternative to traditional control techniques for mobile robots. This method allows you to simulate the experience of a human expert. The benefits of using a network of fuzzy components are not limited to those provided distributed systems. Fuzzy cells are simple to configure while also providing high-level functions such as mergers and decision-making processes.


2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.


2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


Sign in / Sign up

Export Citation Format

Share Document