Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning

Compared with traditional motion planners and deep reinforcement learning DRL has been applied more and more widely to achieving sequential behaviors control of movement robots in internal environment. There are two addressed issues of deep learning. The inability to generalize to achieve set of goals. The data inefficiency, that is, the model requires, many trial and error loops (often costly). Applied can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging. In this paper, we address these two issues and apply the proposed model to visual navigation in conformity with generalizing in conformity with obtaining new goals (target-driven). To tackle the first issue, we advise an actor-critic mannequin whose coverage is a feature of the intention as much properly namely the present day state, which approves higher generalization. To tackle the second issue, we advocate the 3D scenes in environment indoor simulation is AI2-THOR framework, who provides a surrounding including tremendous with high-quality 3D scenes and a physics engine. Our framework allows agents according to receive actions and have interaction with objects. Hence, we are able to accumulate an enormous number of training samples successfully with sequential decision making based totally on the RL framework. Particularly, Healthcare and medicine stand to benefit immensely from deep learning because of the sheer volume of data being generated we used the behavioral cloning approach, who enables the active agent to storeroom an expert (or mentor) policy except for the utilization of reward function stability or generalizes across targets.

Download Full-text

Towards Target-Driven Visual Navigation in Indoor Scenes via Generative Imitation Learning

IEEE Robotics and Automation Letters ◽

10.1109/lra.2020.3036597 ◽

2021 ◽

Vol 6 (1) ◽

pp. 175-182 ◽

Cited By ~ 1

Author(s):

Qiaoyun Wu ◽

Xiaoxi Gong ◽

Kai Xu ◽

Dinesh Manocha ◽

Jingxuan Dong ◽

...

Keyword(s):

Visual Navigation ◽

Imitation Learning ◽

Indoor Scenes

Download Full-text

Target-driven visual navigation in indoor scenes using deep reinforcement learning

2017 IEEE International Conference on Robotics and Automation (ICRA) ◽

10.1109/icra.2017.7989381 ◽

2017 ◽

Cited By ~ 261

Author(s):

Yuke Zhu ◽

Roozbeh Mottaghi ◽

Eric Kolve ◽

Joseph J. Lim ◽

Abhinav Gupta ◽

...

Keyword(s):

Reinforcement Learning ◽

Visual Navigation ◽

Indoor Scenes

Download Full-text

Visual Navigation With Multiple Goals Based on Deep Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3057424 ◽

2021 ◽

pp. 1-11

Author(s):

Zhenhuan Rao ◽

Yuechen Wu ◽

Zifei Yang ◽

Wei Zhang ◽

Shijian Lu ◽

...

Keyword(s):

Reinforcement Learning ◽

Visual Navigation ◽

Multiple Goals

Download Full-text

Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation

Machine Learning ◽

10.1007/s10994-021-06006-6 ◽

2021 ◽

Author(s):

Srivatsan Krishnan ◽

Behzad Boroujerdian ◽

William Fu ◽

Aleksandra Faust ◽

Vijay Janapa Reddi

Keyword(s):

Reinforcement Learning ◽

Embedded System ◽

Broad Class ◽

Visual Navigation ◽

Raspberry Pi ◽

Latency Distribution ◽

Hardware In The Loop ◽

Resource Constrained ◽

Aerial Robot ◽

Policy Optimization

AbstractWe introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies’ performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to $$40\%$$ 40 % longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute’s choice affects the aerial robot’s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at: https://github.com/harvard-edge/AirLearning.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Visual Navigation with Actor-Critic Deep Reinforcement Learning

2018 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2018.8489185 ◽

2018 ◽

Author(s):

Kun Shao ◽

Dongbin Zhao ◽

Yuanheng Zhu ◽

Qichao Zhang

Keyword(s):

Reinforcement Learning ◽

Visual Navigation

Download Full-text

Generating stable molecules using imitation and reinforcement learning

Machine Learning: Science and Technology ◽

10.1088/2632-2153/ac3eb4 ◽

2021 ◽

Author(s):

Søren Ager Meldgaard ◽

Jonas Köhler ◽

Henrik Lund Mortensen ◽

Mads-Peter Verner Christiansen ◽

Frank Noé ◽

...

Keyword(s):

Reinforcement Learning ◽

Chemical Space ◽

Training Data ◽

Graph Representation ◽

Imitation Learning ◽

Training Set ◽

Machine Learning Methods ◽

Multiple Copies ◽

The Stability ◽

3D Information

Abstract Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

Download Full-text

MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation

Neurocomputing ◽

10.1016/j.neucom.2020.07.091 ◽

2021 ◽

Vol 421 ◽

pp. 140-150 ◽

Cited By ~ 1

Author(s):

Yi Lu ◽

Yaran Chen ◽

Dongbin Zhao ◽

Dong Li

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Visual Navigation ◽

Markov Network

Download Full-text

Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5587 ◽

2020 ◽

Vol 34 (02) ◽

pp. 2128-2135

Author(s):

Yang Liu ◽

Qi Liu ◽

Hongke Zhao ◽

Zhen Pan ◽

Chuanren Liu

Keyword(s):

Reinforcement Learning ◽

Trading Strategies ◽

Financial Data ◽

Imitation Learning ◽

Market Condition ◽

Exploration And Exploitation ◽

Markov Decision ◽

Trading Model ◽

Trading Agent ◽

Partially Observable

In recent years, considerable efforts have been devoted to developing AI techniques for finance research and applications. For instance, AI techniques (e.g., machine learning) can help traders in quantitative trading (QT) by automating two tasks: market condition recognition and trading strategies execution. However, existing methods in QT face challenges such as representing noisy high-frequent financial data and finding the balance between exploration and exploitation of the trading agent with AI techniques. To address the challenges, we propose an adaptive trading model, namely iRDPG, to automatically develop QT strategies by an intelligent trading agent. Our model is enhanced by deep reinforcement learning (DRL) and imitation learning techniques. Specifically, considering the noisy financial data, we formulate the QT process as a Partially Observable Markov Decision Process (POMDP). Also, we introduce imitation learning to leverage classical trading strategies useful to balance between exploration and exploitation. For better simulation, we train our trading agent in the real financial market using minute-frequent data. Experimental results demonstrate that our model can extract robust market features and be adaptive in different markets.

Download Full-text