Inverse Reinforcement Learning Intra-operative Path Planning for Steerable Needle

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text

UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9340934 ◽

2020 ◽

Author(s):

Mirco Theile ◽

Harald Bayerlein ◽

Richard Nai ◽

David Gesbert ◽

Marco Caccamo

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Coverage Path Planning ◽

Power Constraints

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

UAV online path planning technology based on deep reinforcement learning

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9327752 ◽

2020 ◽

Author(s):

Jiaxuan Fan ◽

Zhenya Wang ◽

Jinlei Ren ◽

Ying Lu ◽

Yiheng Liu

Keyword(s):

Reinforcement Learning ◽

Path Planning

Download Full-text

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9340876 ◽

2020 ◽

Author(s):

Zuxin Liu ◽

Baiming Chen ◽

Hongyi Zhou ◽

Guru Koushik ◽

Martial Hebert ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Dynamic Environments ◽

Multi Agent

Download Full-text

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Complex & Intelligent Systems ◽

10.1007/s40747-021-00366-1 ◽

2021 ◽

Author(s):

Jie Zhong ◽

Tao Wang ◽

Lianglun Cheng

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Free Path ◽

Inverse Kinematics ◽

Multiple Dimensions ◽

Continuous State ◽

Planning Algorithm ◽

Convergence Performance ◽

Path Planner ◽

Action Spaces

AbstractIn actual welding scenarios, an effective path planner is needed to find a collision-free path in the configuration space for the welding manipulator with obstacles around. However, as a state-of-the-art method, the sampling-based planner only satisfies the probability completeness and its computational complexity is sensitive with state dimension. In this paper, we propose a path planner for welding manipulators based on deep reinforcement learning for solving path planning problems in high-dimensional continuous state and action spaces. Compared with the sampling-based method, it is more robust and is less sensitive with state dimension. In detail, to improve the learning efficiency, we introduce the inverse kinematics module to provide prior knowledge while a gain module is also designed to avoid the local optimal policy, we integrate them into the training algorithm. To evaluate our proposed planning algorithm in multiple dimensions, we conducted multiple sets of path planning experiments for welding manipulators. The results show that our method not only improves the convergence performance but also is superior in terms of optimality and robustness of planning compared with most other planning algorithms.

Download Full-text

Inverse Reinforcement Learning Intra-operative Path Planning for Steerable Needle

Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning

Rover-IRL: Inverse Reinforcement Learning With Soft Value Iteration Networks for Planetary Rover Path Planning

Ensemble Inverse Reinforcement Learning from Semi-Expert Agents

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning

Inverse reinforcement learning in contextual MDPs

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

UAV online path planning technology based on deep reinforcement learning

MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments

Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics

Export Citation Format