scholarly journals Deep Reinforcement Learning for End-to-End Local Motion Planning of Autonomous Aerial Robots in Unknown Outdoor Environments: Real-Time Flight Experiments

Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2534
Author(s):  
Oualid Doukhi ◽  
Deok-Jin Lee

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.

Robotica ◽  
2019 ◽  
Vol 37 (11) ◽  
pp. 1867-1882 ◽  
Author(s):  
Riccardo Polvara ◽  
Sanjay Sharma ◽  
Jian Wan ◽  
Andrew Manning ◽  
Robert Sutton

SummaryAutonomous landing on the deck of a boat or an unmanned surface vehicle (USV) is the minimum requirement for increasing the autonomy of water monitoring missions. This paper introduces an end-to-end control technique based on deep reinforcement learning for landing an unmanned aerial vehicle on a visual marker located on the deck of a USV. The solution proposed consists of a hierarchy of Deep Q-Networks (DQNs) used as high-level navigation policies that address the two phases of the flight: the marker detection and the descending manoeuvre. Few technical improvements have been proposed to stabilize the learning process, such as the combination of vanilla and double DQNs, and a partitioned buffer replay. Simulated studies proved the robustness of the proposed algorithm against different perturbations acting on the marine vessel. The performances obtained are comparable with a state-of-the-art method based on template matching.


2019 ◽  
Vol 29 (02) ◽  
pp. 2050027
Author(s):  
Hassan Javed ◽  
Muhammad Bilal ◽  
Shahid Masud

Live digital video is a valuable source of information in security, broadcast and industrial quality control applications. Motion jitter due to camera and platform instability is a common artefact found in captured video which renders it less effective for subsequent computer vision tasks such as detection and tracking of objects, background modeling, mosaicking, etc. The process of algorithmically compensating for the motion jitter is hence a mandatory pre-processing step in many applications. This process, called video stabilization, requires estimation of global motion from consecutive video frames and is constrainted by additional challenges such as preservation of intentional motion and native frame resolution. The problem is exacerbated in the presence of local motion of foreground objects and requires robust compensation of the same. As such achieving real-time performance for this computationally intensive operation is a difficult task for embedded processors with limited computational and memory resources. In this work, development of an optimized hardware–software co-design framework for video stabilization has been investigated. Efficient video stabilization depends on the identification of key points in the frame which in turn requires dense feature calculation at the pixel level. This task has been identified to be most suitable for offloading the pipelined hardware implemented in the FPGA fabric due to the involvement of complex memory and computation operations. Subsequent tasks to be performed for the overall stabilization algorithm utilize these sparse key points and have been found to be efficiently handled in the software. The proposed Hardware–Software (HW–SW) co-design framework has been implemented on Zedboard FPGA platform which houses Xilinx Zynq SOC equipped with ARM A9 processor. The proposed implementation scheme can process real-time video stream input at 28 frames per second and is at least twice faster than the corresponding software-only approach. Two different hardware accelerator designs have been implemented using different high-level synthesis tools using rapid prototyping principle and consume less than 50% of logic resources available on the host FPGA while being at least 30% faster than contemporary designs.


2020 ◽  
Vol 6 (4) ◽  
pp. 43-54 ◽  
Author(s):  
Martin Klesen ◽  
Patrick Gebhard

In this paper we report about the use of computer generated affect to control body and mind of cognitively modeled virtual characters. We use the computational model of affect ALMA that is able to simulate three different affect types in real-time. The computation of affect is based on a novel approach of an appraisal language. Both the use of elements of the appraisal language and the simulation of different affect types has been evaluated. Affect is used to control facial expressions, facial complexions, affective animations, posture, and idle behavior on the body layer and the selection of dialogue strategies on the mind layer. To enable a fine-grained control of these aspects a Player Markup Language (PML) has been developed. The PML is player-independent and allows a sophisticated control of character actions coordinated by high-level temporal constraints. An Action Encoder module maps the output of ALMA to PML actions using affect display rules. These actions drive the real-time rendering of affect, gesture and speech parameters of virtual characters, which we call Virtual Humans. 


Author(s):  
Ezebuugo Nwaonumah ◽  
Biswanath Samanta

Abstract A study is presented on applying deep reinforcement learning (DRL) for visual navigation of wheeled mobile robots (WMR), both in simulation and real-time implementation under dynamic and unknown environments. The policy gradient based asynchronous advantage actor critic (A3C), has been considered. RGB (red, green and blue) and depth images have been used as inputs in implementation of A3C algorithm to generate control commands for autonomous navigation of WMR. The initial A3C network was generated and trained progressively in OpenAI Gym Gazebo based simulation environments within robot operating system (ROS) framework for a popular target WMR, Kobuki TurtleBot2. A pre-trained deep neural network ResNet50 was used after further training with regrouped objects commonly found in laboratory setting for target-driven visual navigation of Turlebot2 through DRL. The performance of A3C with multiple computation threads (4, 6, and 8) was simulated and compared in three simulation environments. The performance of A3C improved with number of threads. The trained model of A3C with 8 threads was implemented with online learning using Nvidia Jetson TX2 on-board Turtlebot2 for mapless navigation in different real-life environments. Details of the methodology, results of simulation and real-time implementation through transfer learning are presented along with recommendations for future work.


Author(s):  
Zhen Li ◽  
Xin Chen ◽  
Mingyang Xie ◽  
Zhenhua Zhao

In this paper, an adaptive fault-tolerant attitude tracking controller based on reinforcement learning is developed for flying-wing unmanned aerial vehicle subjected to actuator faults and saturation. At first, the attitude dynamic model is separated into two dynamic subsystems as slow and fast dynamic subsystems based on the principle of time scale separation. Secondly, backstepping technique is adopted to design the controller. For the purpose of attitude angle constraints, the control technique based on Barrier Lyapunov is used to design controller of slow dynamic subsystem. Considering the optimization of the fast dynamic subsystem, this paper introduces an adaptive reinforcement learning control method in which neural network is used to approximate the long-term performance index and lumped fault dynamic. It is shown that this control algorithm can satisfy the requirements of attitude tracking subjected to the control constraints and the stability of the system is proved from Lyapunov stability theory. The simulation results demonstrate that the developed fault-tolerant scheme is useful and has more smooth control effect compared with fault-tolerant controller based on sliding mode theory.


2011 ◽  
Vol 30 (2) ◽  
pp. 175-191 ◽  
Author(s):  
Matt Zucker ◽  
Nathan Ratliff ◽  
Martin Stolle ◽  
Joel Chestnutt ◽  
J Andrew Bagnell ◽  
...  

We present a novel approach to legged locomotion over rough terrain that is thoroughly rooted in optimization. This approach relies on a hierarchy of fast, anytime algorithms to plan a set of footholds, along with the dynamic body motions required to execute them. Components within the planning framework coordinate to exchange plans, cost-to-go estimates, and ‘certificates’ that ensure the output of an abstract high-level planner can be realized by lower layers of the hierarchy. The burden of careful engineering of cost functions to achieve desired performance is substantially mitigated by a simple inverse optimal control technique. Robustness is achieved by real-time re-planning of the full trajectory, augmented by reflexes and feedback control. We demonstrate the successful application of our approach in guiding the LittleDog quadruped robot over a variety of types of rough terrain. Other novel aspects of our past research efforts include a variety of pioneering inverse optimal control techniques as well as a system for planning using arbitrary pre-recorded robot behavior.


Author(s):  
Ritesh Noothigattu ◽  
Djallel Bouneffouf ◽  
Nicholas Mattei ◽  
Rachita Chandra ◽  
Piyush Madan ◽  
...  

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.


Sign in / Sign up

Export Citation Format

Share Document