Exploring the Task Cooperation in Multi-goal Visual Navigation

Learning to adapt to a series of different goals in visual navigation is challenging. In this work, we present a model-embedded actor-critic architecture for the multi-goal visual navigation task. To enhance the task cooperation in multi-goal learning, we introduce two new designs to the reinforcement learning scheme: inverse dynamics model (InvDM) and multi-goal co-learning (MgCl). Specifically, InvDM is proposed to capture the navigation-relevant association between state and goal, and provide additional training signals to relieve the sparse reward issue. MgCl aims at improving the sample efficiency and supports the agent to learn from unintentional positive experiences. Extensive results on the interactive platform AI2-THOR demonstrate that the proposed method converges faster than state-of-the-art methods while producing more direct routes to navigate to the goal. The video demonstration is available at: https://youtube.com/channel/UCtpTMOsctt3yPzXqe_JMD3w/videos.

Download Full-text

Unbiased Model-Agnostic Metalearning Algorithm for Learning Target-Driven Visual Navigation Policy

Computational Intelligence and Neuroscience ◽

10.1155/2021/5620751 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Tianfang Xue ◽

Haibin Yu

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Visual Navigation ◽

Initial Model ◽

Generalization Capability ◽

Learning Methods ◽

Inequality Measures ◽

Great Progress ◽

Classification Tasks ◽

Navigation Experience

As deep reinforcement learning methods have made great progress in the visual navigation field, metalearning-based algorithms are gaining more attention since they greatly improve the expansibility of moving agents. According to metatraining mechanism, typically an initial model is trained as a metalearner by existing navigation tasks and becomes well performed in new scenes through relatively few recursive trials. However, if a metalearner is overtrained on the former tasks, it may hardly achieve generalization on navigating in unfamiliar environments as the initial model turns out to be quite biased towards former ambient configuration. In order to train an impartial navigation model and enhance its generalization capability, we propose an Unbiased Model-Agnostic Metalearning (UMAML) algorithm towards target-driven visual navigation. Inspired by entropy-based methods, maximizing the uncertainty over output labels in classification tasks, we adopt inequality measures used in Economics as a concise metric to calculate the loss deviation across unfamiliar tasks. With succinctly minimizing the inequality of task losses, an unbiased navigation model without overperforming in particular scene types can be learnt based on Model-Agnostic Metalearning mechanism. The exploring agent complies with a more balanced update rule, able to gather navigation experience from training environments. Several experiments have been conducted, and results demonstrate that our approach outperforms other state-of-the-art metalearning navigation methods in generalization ability.

Download Full-text

Human–agent transfer from observations

The Knowledge Engineering Review ◽

10.1017/s0269888920000387 ◽

2020 ◽

Vol 36 ◽

Author(s):

Bikramjit Banerjee ◽

Sneha Racharla

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Recent Work ◽

Inverse Dynamics ◽

Decision Function ◽

Dynamics Model ◽

Human Agent ◽

Baseline Algorithm ◽

Continuous Actions ◽

Human Demonstration

Abstract Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.

Download Full-text

Learning Reward Function with Matching Network for Mapless Navigation

Sensors ◽

10.3390/s20133664 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3664 ◽

Cited By ~ 1

Author(s):

Qichen Zhang ◽

Meiqiang Zhu ◽

Liang Zou ◽

Ming Li ◽

Yong Zhang

Keyword(s):

Reinforcement Learning ◽

Optimal Strategy ◽

State Of The Art ◽

The State ◽

Matching Network ◽

Additional Training ◽

Reward Function ◽

Moving Obstacles ◽

Reward Shaping ◽

Simulation Results

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

Download Full-text

Multi-modality Sensor Data Classification with Selective Attention

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/432 ◽

2018 ◽

Cited By ~ 7

Author(s):

Xiang Zhang ◽

Lina Yao ◽

Chaoran Huang ◽

Sen Wang ◽

Mingkui Tan ◽

...

Keyword(s):

Reinforcement Learning ◽

Selective Attention ◽

State Of The Art ◽

Sensor Data ◽

Classification Task ◽

Wearable Sensor ◽

Domain Specific ◽

Extra Information ◽

Learning Scheme ◽

Wide Range

Multimodel wearable sensor data classificationplays an important role in ubiquitous computingand has a wide range of applications in variousscenarios from healthcare to entertainment. How-ever, most of the existing work in this field em-ploys domain-specific approaches and is thus inef-fective in complex situations where multi-modalitysensor data is collected. Moreover, the wearablesensor data is less informative than the conven-tional data such as texts or images. In this paper,to improve the adaptability of such classificationmethods across different application contexts, weturn this classification task into a game and applya deep reinforcement learning scheme to dynami-cally deal with complex situations. We also intro-duce a selective attention mechanism into the rein-forcement learning scheme to focus on the crucialdimensions of the data. This mechanism helps tocapture extra information from the signal, and canthus significantly improve the discriminative powerof the classifier. We carry out several experimentson three wearable sensor datasets, and demonstratecompetitive performance of the proposed approachcompared to several state-of-the-art baselines.

Download Full-text

Reactive Balance Control for Legged Robots under Visco-Elastic Contacts

Applied Sciences ◽

10.3390/app11010353 ◽

2020 ◽

Vol 11 (1) ◽

pp. 353

Author(s):

Thomas Flayols ◽

Andrea Del Prete ◽

Majid Khadiv ◽

Nicolas Mansard ◽

Ludovic Righetti

Keyword(s):

Inverse Dynamics ◽

State Of The Art ◽

Balance Control ◽

Contact Stiffness ◽

Poor Performance ◽

Admittance Control ◽

Inverse Dynamics Control ◽

Rigid Contact ◽

Reactive Balance ◽

Shed Light

Contacts between robots and environment are often assumed to be rigid for control purposes. This assumption can lead to poor performance when contacts are soft and/or underdamped. However, the problem of balancing on soft contacts has not received much attention in the literature. This paper presents two novel approaches to control a legged robot balancing on visco-elastic contacts, and compares them to other two state-of-the-art methods. Our simulation results show that performance heavily depends on the contact stiffness and the noises/uncertainties introduced in the simulation. Briefly, the two novel controllers performed best for soft/medium contacts, whereas “inverse-dynamics control under rigid-contact assumptions” was the best one for stiff contacts. Admittance control was instead the most robust, but suffered in terms of performance. These results shed light on this challenging problem, while pointing out interesting directions for future investigation.

Download Full-text

Visual Navigation With Multiple Goals Based on Deep Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3057424 ◽

2021 ◽

pp. 1-11

Author(s):

Zhenhuan Rao ◽

Yuechen Wu ◽

Zifei Yang ◽

Wei Zhang ◽

Shijian Lu ◽

...

Keyword(s):

Reinforcement Learning ◽

Visual Navigation ◽

Multiple Goals

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation

Machine Learning ◽

10.1007/s10994-021-06006-6 ◽

2021 ◽

Author(s):

Srivatsan Krishnan ◽

Behzad Boroujerdian ◽

William Fu ◽

Aleksandra Faust ◽

Vijay Janapa Reddi

Keyword(s):

Reinforcement Learning ◽

Embedded System ◽

Broad Class ◽

Visual Navigation ◽

Raspberry Pi ◽

Latency Distribution ◽

Hardware In The Loop ◽

Resource Constrained ◽

Aerial Robot ◽

Policy Optimization

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.

Download Full-text

Drone Deep Reinforcement Learning: A Review

Electronics ◽

10.3390/electronics10090999 ◽

2021 ◽

Vol 10 (9) ◽

pp. 999

Author(s):

Ahmad Taher Azar ◽

Anis Koubaa ◽

Nada Ali Mohamed ◽

Habiba A. Ibrahim ◽

Zahra Fathy Ibrahim ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Real Life ◽

Environment Monitoring ◽

Simulated Environments ◽

Infrastructure Inspection ◽

Remote Sensing Mapping ◽

And Control ◽

The Military ◽

Uav Navigation

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.

Download Full-text

Smartphone-Based Indoor Visual Navigation with Leader-Follower Mode

ACM Transactions on Sensor Networks ◽

10.1145/3448417 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-22

Author(s):

Jingao Xu ◽

Erqun Dong ◽

Qiang Ma ◽

Chenshu Wu ◽

Zheng Yang

Keyword(s):

Real Time ◽

Environmental Changes ◽

State Of The Art ◽

Visual Navigation ◽

Indoor Navigation ◽

Location Services ◽

Localization And Mapping ◽

Leaders And Followers ◽

Indoor Navigation System ◽

Free Pair

Existing indoor navigation solutions usually require pre-deployed comprehensive location services with precise indoor maps and, more importantly, all rely on dedicatedly installed or existing infrastructure. In this article, we present Pair-Navi, an infrastructure-free indoor navigation system that circumvents all these requirements by reusing a previous traveler’s (i.e., leader) trace experience to navigate future users (i.e., followers) in a Peer-to-Peer mode. Our system leverages the advances of visual simultaneous localization and mapping ( SLAM ) on commercial smartphones. Visual SLAM systems, however, are vulnerable to environmental dynamics in the precision and robustness and involve intensive computation that prohibits real-time applications. To combat environmental changes, we propose to cull non-rigid contexts and keep only the static and rigid contents in use. To enable real-time navigation on mobiles, we decouple and reorganize the highly coupled SLAM modules for leaders and followers. We implement Pair-Navi on commodity smartphones and validate its performance in three diverse buildings and two standard datasets (TUM and KITTI). Our results show that Pair-Navi achieves an immediate navigation success rate of 98.6%, which maintains as 83.4% even after 2 weeks since the leaders’ traces were collected, outperforming the state-of-the-art solutions by >50%. Being truly infrastructure-free, Pair-Navi sheds lights on practical indoor navigations for mobile users.

Download Full-text