Deep Reinforcement Learning for End-to-End Local Motion Planning of Autonomous Aerial Robots in Unknown Outdoor Environments: Real-Time Flight Experiments

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.

Download Full-text

Autonomous Vehicular Landings on the Deck of an Unmanned Surface Vehicle using Deep Reinforcement Learning

Robotica ◽

10.1017/s0263574719000316 ◽

2019 ◽

Vol 37 (11) ◽

pp. 1867-1882 ◽

Cited By ~ 2

Author(s):

Riccardo Polvara ◽

Sanjay Sharma ◽

Jian Wan ◽

Andrew Manning ◽

Robert Sutton

Keyword(s):

Reinforcement Learning ◽

Template Matching ◽

Control Technique ◽

Autonomous Landing ◽

Unmanned Surface Vehicle ◽

Minimum Requirement ◽

Aerial Vehicle ◽

Two Phases ◽

High Level ◽

Marker Detection

SummaryAutonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.

Download Full-text

Autonomous navigation of UAV by using real-time model-based reinforcement learning

2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) ◽

10.1109/icarcv.2016.7838739 ◽

2016 ◽

Cited By ~ 26

Author(s):

Nursultan Imanberdiyev ◽

Changhong Fu ◽

Erdal Kayacan ◽

I-Ming Chen

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Autonomous Navigation ◽

Time Model ◽

Model Based

Download Full-text

A Hardware–Software Co-Design Framework for Real-Time Video Stabilization

Journal of Circuits System and Computers ◽

10.1142/s0218126620500279 ◽

2019 ◽

Vol 29 (02) ◽

pp. 2050027

Author(s):

Hassan Javed ◽

Muhammad Bilal ◽

Shahid Masud

Keyword(s):

Real Time ◽

Video Stream ◽

Video Stabilization ◽

Design Framework ◽

Local Motion ◽

Video Frames ◽

Key Points ◽

Implementation Scheme ◽

Computationally Intensive ◽

High Level

Live digital video is a valuable source of information in security, broadcast and industrial quality control applications. Motion jitter due to camera and platform instability is a common artefact found in captured video which renders it less effective for subsequent computer vision tasks such as detection and tracking of objects, background modeling, mosaicking, etc. The process of algorithmically compensating for the motion jitter is hence a mandatory pre-processing step in many applications. This process, called video stabilization, requires estimation of global motion from consecutive video frames and is constrainted by additional challenges such as preservation of intentional motion and native frame resolution. The problem is exacerbated in the presence of local motion of foreground objects and requires robust compensation of the same. As such achieving real-time performance for this computationally intensive operation is a difficult task for embedded processors with limited computational and memory resources. In this work, development of an optimized hardware–software co-design framework for video stabilization has been investigated. Efficient video stabilization depends on the identification of key points in the frame which in turn requires dense feature calculation at the pixel level. This task has been identified to be most suitable for offloading the pipelined hardware implemented in the FPGA fabric due to the involvement of complex memory and computation operations. Subsequent tasks to be performed for the overall stabilization algorithm utilize these sparse key points and have been found to be efficiently handled in the software. The proposed Hardware–Software (HW–SW) co-design framework has been implemented on Zedboard FPGA platform which houses Xilinx Zynq SOC equipped with ARM A9 processor. The proposed implementation scheme can process real-time video stream input at 28 frames per second and is at least twice faster than the corresponding software-only approach. Two different hardware accelerator designs have been implemented using different high-level synthesis tools using rapid prototyping principle and consume less than 50% of logic resources available on the host FPGA while being at least 30% faster than contemporary designs.

Download Full-text

A Real-Time Linux system for autonomous navigation and flight attitude control of an uninhabited aerial vehicle

20th DASC. 20th Digital Avionics Systems Conference (Cat. No.01CH37219) ◽

10.1109/dasc.2001.963300 ◽

2002 ◽

Cited By ~ 6

Author(s):

C.E. Hall

Keyword(s):

Real Time ◽

Attitude Control ◽

Autonomous Navigation ◽

Aerial Vehicle

Download Full-text

Affective Multimodal Control of Virtual Characters

International Journal of Virtual Reality ◽

10.20870/ijvr.2007.6.4.3864 ◽

2020 ◽

Vol 6 (4) ◽

pp. 43-54 ◽

Cited By ~ 2

Author(s):

Martin Klesen ◽

Patrick Gebhard

Keyword(s):

Real Time ◽

The Body ◽

Temporal Constraints ◽

Virtual Characters ◽

Fine Grained ◽

Novel Approach ◽

Control Body ◽

High Level ◽

The Mind ◽

Selection Of

In this paper we report about the use of computer generated affect to control body and mind of cognitively modeled virtual characters. We use the computational model of affect ALMA that is able to simulate three different affect types in real-time. The computation of affect is based on a novel approach of an appraisal language. Both the use of elements of the appraisal language and the simulation of different affect types has been evaluated. Affect is used to control facial expressions, facial complexions, affective animations, posture, and idle behavior on the body layer and the selection of dialogue strategies on the mind layer. To enable a fine-grained control of these aspects a Player Markup Language (PML) has been developed. The PML is player-independent and allows a sophisticated control of character actions coordinated by high-level temporal constraints. An Action Encoder module maps the output of ALMA to PML actions using affect display rules. These actions drive the real-time rendering of affect, gesture and speech parameters of virtual characters, which we call Virtual Humans.

Download Full-text

Visual Navigation of Wheeled Mobile Robots Using Deep Reinforcement Learning: Simulation to Real-Time Implementation

Volume 1: Adaptive/Intelligent Sys. Control; Driver Assistance/Autonomous Tech.; Control Design Methods; Nonlinear Control; Robotics; Assistive/Rehabilitation Devices; Biomedical/Neural Systems; Building Energy Systems; Connected Vehicle Systems; Control/Estimation of Energy Systems; Control Apps.; Smart Buildings/Microgrids; Education; Human-Robot Systems; Soft Mechatronics/Robotic Components/Systems; Energy/Power Systems; Energy Storage; Estimation/Identification; Vehicle Efficiency/Emissions ◽

10.1115/dscc2020-3279 ◽

2020 ◽

Author(s):

Ezebuugo Nwaonumah ◽

Biswanath Samanta

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Real Time ◽

Autonomous Navigation ◽

Real Life ◽

Visual Navigation ◽

Wheeled Mobile Robots ◽

Depth Images ◽

Gradient Based ◽

Simulation Environments

Abstract A study is presented on applying deep reinforcement learning (DRL) for visual navigation of wheeled mobile robots (WMR), both in simulation and real-time implementation under dynamic and unknown environments. The policy gradient based asynchronous advantage actor critic (A3C), has been considered. RGB (red, green and blue) and depth images have been used as inputs in implementation of A3C algorithm to generate control commands for autonomous navigation of WMR. The initial A3C network was generated and trained progressively in OpenAI Gym Gazebo based simulation environments within robot operating system (ROS) framework for a popular target WMR, Kobuki TurtleBot2. A pre-trained deep neural network ResNet50 was used after further training with regrouped objects commonly found in laboratory setting for target-driven visual navigation of Turlebot2 through DRL. The performance of A3C with multiple computation threads (4, 6, and 8) was simulated and compared in three simulation environments. The performance of A3C improved with number of threads. The trained model of A3C with 8 threads was implemented with online learning using Nvidia Jetson TX2 on-board Turtlebot2 for mapless navigation in different real-life environments. Details of the methodology, results of simulation and real-time implementation through transfer learning are presented along with recommendations for future work.

Download Full-text

Dense Omnidirectional RGB-D Mapping of Large-scale Outdoor Environments for Real-time Localization and Autonomous Navigation

Journal of Field Robotics ◽

10.1002/rob.21531 ◽

2014 ◽

Vol 32 (4) ◽

pp. 474-503 ◽

Cited By ~ 14

Author(s):

Maxime Meilland ◽

Andrew I. Comport ◽

Patrick Rives

Keyword(s):

Real Time ◽

Autonomous Navigation ◽

Large Scale ◽

Outdoor Environments

Download Full-text

Adaptive fault-tolerant tracking control of flying-wing unmanned aerial vehicle with system input saturation and state constraints

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211027037 ◽

2021 ◽

pp. 014233122110270

Author(s):

Zhen Li ◽

Xin Chen ◽

Mingyang Xie ◽

Zhenhua Zhao

Keyword(s):

Reinforcement Learning ◽

Unmanned Aerial Vehicle ◽

Fault Tolerant ◽

Sliding Mode ◽

Input Saturation ◽

Control Technique ◽

Attitude Tracking ◽

Aerial Vehicle ◽

Fast Dynamic ◽

Dynamic Subsystem

In this paper, an adaptive fault-tolerant attitude tracking controller based on reinforcement learning is developed for flying-wing unmanned aerial vehicle subjected to actuator faults and saturation. At first, the attitude dynamic model is separated into two dynamic subsystems as slow and fast dynamic subsystems based on the principle of time scale separation. Secondly, backstepping technique is adopted to design the controller. For the purpose of attitude angle constraints, the control technique based on Barrier Lyapunov is used to design controller of slow dynamic subsystem. Considering the optimization of the fast dynamic subsystem, this paper introduces an adaptive reinforcement learning control method in which neural network is used to approximate the long-term performance index and lumped fault dynamic. It is shown that this control algorithm can satisfy the requirements of attitude tracking subjected to the control constraints and the stability of the system is proved from Lyapunov stability theory. The simulation results demonstrate that the developed fault-tolerant scheme is useful and has more smooth control effect compared with fault-tolerant controller based on sliding mode theory.

Download Full-text

Optimization and learning for rough terrain legged locomotion

The International Journal of Robotics Research ◽

10.1177/0278364910392608 ◽

2011 ◽

Vol 30 (2) ◽

pp. 175-191 ◽

Cited By ~ 55

Author(s):

Matt Zucker ◽

Nathan Ratliff ◽

Martin Stolle ◽

Joel Chestnutt ◽

J Andrew Bagnell ◽

...

Keyword(s):

Optimal Control ◽

Past Research ◽

Legged Locomotion ◽

Control Technique ◽

Rough Terrain ◽

Inverse Optimal Control ◽

Novel Approach ◽

Planning Framework ◽

Robot Behavior ◽

High Level

We present a novel approach to legged locomotion over rough terrain that is thoroughly rooted in optimization. This approach relies on a hierarchy of fast, anytime algorithms to plan a set of footholds, along with the dynamic body motions required to execute them. Components within the planning framework coordinate to exchange plans, cost-to-go estimates, and ‘certificates’ that ensure the output of an abstract high-level planner can be realized by lower layers of the hierarchy. The burden of careful engineering of cost functions to achieve desired performance is substantially mitigated by a simple inverse optimal control technique. Robustness is achieved by real-time re-planning of the full trajectory, augmented by reflexes and feedback control. We demonstrate the successful application of our approach in guiding the LittleDog quadruped robot over a variety of types of rough terrain. Other novel aspects of our past research efforts include a variety of pioneering inverse optimal control techniques as well as a system for planning using arbitrary pre-recorded robot behavior.

Download Full-text

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/891 ◽

2019 ◽

Author(s):

Ritesh Noothigattu ◽

Djallel Bouneffouf ◽

Nicholas Mattei ◽

Rachita Chandra ◽

Piyush Madan ◽

...

Keyword(s):

Reinforcement Learning ◽

Ethical Values ◽

Large Role ◽

Learning To Learn ◽

Inverse Reinforcement Learning ◽

Time Step ◽

Novel Approach

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text