Image Visual Sensor Used in Health-Care Navigation in Indoor Scenes Using Deep Reinforcement Learning (DRL) and Control Sensor Robot for Patients Data Health Information

Compared with traditional motion planners and deep reinforcement learning DRL has been applied more and more widely to achieving sequential behaviors control of movement robots in internal environment. There are two addressed issues of deep learning. The inability to generalize to achieve set of goals. The data inefficiency, that is, the model requires, many trial and error loops (often costly). Applied can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging. In this paper, we address these two issues and apply the proposed model to visual navigation in conformity with generalizing in conformity with obtaining new goals (target-driven). To tackle the first issue, we advise an actor-critic mannequin whose coverage is a feature of the intention as much properly namely the present day state, which approves higher generalization. To tackle the second issue, we advocate the 3D scenes in environment indoor simulation is AI2-THOR framework, who provides a surrounding including tremendous with high-quality 3D scenes and a physics engine. Our framework allows agents according to receive actions and have interaction with objects. Hence, we are able to accumulate an enormous number of training samples successfully with sequential decision making based totally on the RL framework. Particularly, Healthcare and medicine stand to benefit immensely from deep learning because of the sheer volume of data being generated we used the behavioral cloning approach, who enables the active agent to storeroom an expert (or mentor) policy except for the utilization of reward function stability or generalizes across targets.

Download Full-text

Target‐driven visual navigation in indoor scenes using reinforcement learning and imitation learning

CAAI Transactions on Intelligence Technology ◽

10.1049/cit2.12043 ◽

2021 ◽

Author(s):

Qiang Fang ◽

Xin Xu ◽

Xitong Wang ◽

Yujun Zeng

Keyword(s):

Reinforcement Learning ◽

Visual Navigation ◽

Imitation Learning ◽

Indoor Scenes

Download Full-text

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017749 ◽

2019 ◽

Vol 33 ◽

pp. 7749-7758

Author(s):

Daniel S. Brown ◽

Scott Niekum

Keyword(s):

Reinforcement Learning ◽

Set Cover ◽

Sequential Decision ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Set Cover Problem ◽

Efficient Approximation Algorithm ◽

Minimum Number ◽

Teaching Problem ◽

Novel Applications

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decisionmaking task. We formalize the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the reward equivalence class of the demonstrator. We extend previous work on algorithmic teaching for sequential decision-making tasks by showing a reduction to the set cover problem which enables an efficient approximation algorithm for determining the set of maximallyinformative demonstrations. We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and developing a novel IRL algorithm that can learn more efficiently from informative demonstrations than a standard IRL approach.

Download Full-text

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yang Gao ◽

Christian M. Meyer ◽

Mohsen Mesgar ◽

Iryna Gurevych

Keyword(s):

Reinforcement Learning ◽

Learning To Rank ◽

Poor Performance ◽

Parameter Tuning ◽

Test Time ◽

Sequential Decision ◽

Time Data ◽

Training Time ◽

Search Spaces ◽

Reward Function

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

Download Full-text

On the use of the policy gradient and Hessian in inverse reinforcement learning

Intelligenza Artificiale ◽

10.3233/ia-180011 ◽

2020 ◽

Vol 14 (1) ◽

pp. 117-150

Author(s):

Alberto Maria Metelli ◽

Matteo Pirotta ◽

Marcello Restelli

Keyword(s):

Reinforcement Learning ◽

Sequential Decision ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Model Free ◽

Learning Speed ◽

Policy Gradient ◽

Continuous Domains ◽

Learning Policies ◽

Finite Domains

Reinforcement Learning (RL) is an effective approach to solve sequential decision making problems when the environment is equipped with a reward function to evaluate the agent’s actions. However, there are several domains in which a reward function is not available and difficult to estimate. When samples of expert agents are available, Inverse Reinforcement Learning (IRL) allows recovering a reward function that explains the demonstrated behavior. Most of the classic IRL methods, in addition to expert’s demonstrations, require sampling the environment to evaluate each reward function, that, in turn, is built starting from a set of engineered features. This paper is about a novel model-free IRL approach that does not require to specify a function space where to search for the expert’s reward function. Leveraging on the fact that the policy gradient needs to be zero for an optimal policy, the algorithm generates an approximation space for the reward function, in which a reward is singled out employing a second-order criterion. After introducing our approach for finite domains, we extend it to continuous ones. The empirical results, on both finite and continuous domains, show that the reward function recovered by our algorithm allows learning policies that outperform those obtained with the true reward function, in terms of learning speed.

Download Full-text

Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review

Robotics ◽

10.3390/robotics10010022 ◽

2021 ◽

Vol 10 (1) ◽

pp. 22

Author(s):

Rongrong Liu ◽

Florent Nageotte ◽

Philippe Zanne ◽

Michel de Mathelin ◽

Birgitta Dresp-Langley

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Robotic Manipulation ◽

Significant Progress ◽

Expert Performance ◽

Optimal Behavior ◽

Real World Applications ◽

Critical Issues ◽

And Control ◽

Manipulation And Control

Deep learning has provided new ways of manipulating, processing and analyzing data. It sometimes may achieve results comparable to, or surpassing human expert performance, and has become a source of inspiration in the era of artificial intelligence. Another subfield of machine learning named reinforcement learning, tries to find an optimal behavior strategy through interactions with the environment. Combining deep learning and reinforcement learning permits resolving critical issues relative to the dimensionality and scalability of data in tasks with sparse reward signals, such as robotic manipulation and control tasks, that neither method permits resolving when applied on its own. In this paper, we present recent significant progress of deep reinforcement learning algorithms, which try to tackle the problems for the application in the domain of robotic manipulation control, such as sample efficiency and generalization. Despite these continuous improvements, currently, the challenges of learning robust and versatile manipulation skills for robots with deep reinforcement learning are still far from being resolved for real-world applications.

Download Full-text

Deep-learning jets with uncertainties and more

SciPost Physics ◽

10.21468/scipostphys.8.1.006 ◽

2020 ◽

Vol 8 (1) ◽

Cited By ~ 8

Author(s):

Sven Bollweg ◽

Manuel Haussmann ◽

Gregor Kasieczka ◽

Michel Luchmann ◽

Tilman Plehn ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Bayesian Networks ◽

Network Performance ◽

Energy Scale ◽

Error Band ◽

Training Samples ◽

Pile Up ◽

Stability Issues ◽

And Control

Bayesian neural networks allow us to keep track of uncertainties, for example in top tagging, by learning a tagger output together with an error band. We illustrate the main features of Bayesian versions of established deep-learning taggers. We show how they capture statistical uncertainties from finite training samples, systematics related to the jet energy scale, and stability issues through pile-up. Altogether, Bayesian networks offer many new handles to understand and control deep learning at the LHC without introducing a visible prior effect and without compromising the network performance.

Download Full-text

Skill-based curiosity for intrinsically motivated reinforcement learning

Machine Learning ◽

10.1007/s10994-019-05845-8 ◽

2019 ◽

Vol 109 (3) ◽

pp. 493-512 ◽

Cited By ~ 2

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Skill Learning ◽

High Dimensional ◽

Sequential Decision ◽

Learning Methods ◽

Reward Function ◽

Intrinsic Reward ◽

Reinforcement Learning Models ◽

Data Efficiency

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.

Download Full-text

Ensemble Consensus-based Representation Deep Reinforcement Learning for Hybrid FSO/RF Communication Systems

10.36227/techrxiv.15109434 ◽

2021 ◽

Author(s):

Shagufta Henna

Keyword(s):

Reinforcement Learning ◽

Communication Systems ◽

Representation Learning ◽

System Capacity ◽

Rf System ◽

State Action ◽

Rf Systems ◽

Reward Function ◽

And Control ◽

Learned Features

<div>Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF’s consensus learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.</div>

Download Full-text

Ensemble Consensus-based Representation Deep Reinforcement Learning for Hybrid FSO/RF Communication Systems

10.36227/techrxiv.15109434.v1 ◽

2021 ◽

Author(s):

Shagufta Henna

Keyword(s):

Reinforcement Learning ◽

Communication Systems ◽

Representation Learning ◽

System Capacity ◽

Rf System ◽

State Action ◽

Rf Systems ◽

Reward Function ◽

And Control ◽

Learned Features

Download Full-text

Fast reinforcement learning with generalized policy updates

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1907370117 ◽

2020 ◽

Vol 117 (48) ◽

pp. 30079-30087 ◽

Cited By ~ 2

Author(s):

André Barreto ◽

Shaobo Hou ◽

Diana Borsa ◽

David Silver ◽

Doina Precup

Keyword(s):

Reinforcement Learning ◽

Divide And Conquer ◽

Problem Decomposition ◽

Sequential Decision ◽

Learning Problem ◽

Reward Function ◽

Speed Up ◽

Complex Decision ◽

Reward Functions ◽

Do So

The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

Download Full-text