Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Object Affordance Driven Inverse Reinforcement Learning Through Conceptual Abstraction and Advice

Paladyn Journal of Behavioral Robotics ◽

10.1515/pjbr-2018-0021 ◽

2018 ◽

Vol 9 (1) ◽

pp. 277-294 ◽

Cited By ~ 1

Author(s):

Rupam Bhattacharyya ◽

Shyamanta M. Hazarika

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

Inverse Reinforcement Learning ◽

Intent Recognition ◽

Reward Function ◽

Object Affordances ◽

Learning Agent ◽

Markov Decision ◽

Observed Behaviour ◽

Object Affordance

Abstract Within human Intent Recognition (IR), a popular approach to learning from demonstration is Inverse Reinforcement Learning (IRL). IRL extracts an unknown reward function from samples of observed behaviour. Traditional IRL systems require large datasets to recover the underlying reward function. Object affordances have been used for IR. Existing literature on recognizing intents through object affordances fall short of utilizing its true potential. In this paper, we seek to develop an IRL system which drives human intent recognition along with the capability to handle high dimensional demonstrations exploiting the capability of object affordances. An architecture for recognizing human intent is presented which consists of an extended Maximum Likelihood Inverse Reinforcement Learning agent. Inclusion of Symbolic Conceptual Abstraction Engine (SCAE) along with an advisor allows the agent to work on Conceptually Abstracted Markov Decision Process. The agent recovers object affordance based reward function from high dimensional demonstrations. This function drives a Human Intent Recognizer through identification of probable intents. Performance of the resulting system on the standard CAD-120 dataset shows encouraging result.

Download Full-text

Bayesian Nonparametric Inverse Reinforcement Learning for Switched Markov Decision Processes

2014 13th International Conference on Machine Learning and Applications ◽

10.1109/icmla.2014.105 ◽

2014 ◽

Cited By ~ 5

Author(s):

Amit Surana ◽

Kunal Srivastava

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Bayesian Nonparametric ◽

Inverse Reinforcement Learning ◽

Markov Decision

Download Full-text

Efficient PAC Reinforcement Learning in Regular Decision Processes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/279 ◽

2021 ◽

Author(s):

Alessandro Ronca ◽

Giuseppe De Giacomo

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Polynomial Time ◽

Optimal Policy ◽

Decision Process ◽

Transition Function ◽

Decision Processes ◽

Reward Function ◽

Markov Decision ◽

Reward Functions

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.

Download Full-text

An inverse reinforcement learning algorithm for semi-Markov decision processes

2017 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2017.8280816 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chuanfang Tan ◽

Yanjie Li ◽

Yuhu Cheng

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Learning Algorithm ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Markov Decision ◽

Reinforcement Learning Algorithm

Download Full-text

An Overview of Inverse Reinforcement Learning Techniques

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210097 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Giuseppe De Pietro

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Decision Process ◽

Autonomous Agents ◽

Theoretical Background ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Learning Techniques ◽

Markov Decision ◽

Potential Use

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.

Download Full-text

Reinforcement Learning with a Corrupted Reward Channel

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/656 ◽

2017 ◽

Cited By ~ 9

Author(s):

Tom Everitt ◽

Victoria Krakovna ◽

Laurent Orseau ◽

Shane Legg

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Decision Problem ◽

Inverse Reinforcement Learning ◽

Markov Decision Problem ◽

Software Bugs ◽

Reward Function ◽

Learning Agent ◽

Markov Decision ◽

Simplifying Assumptions

No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.

Download Full-text

Forward and Inverse Reinforcement Learning Based on Linearly Solvable Markov Decision Processes

The Brain & Neural Networks ◽

10.3902/jnns.23.2 ◽

2016 ◽

Vol 23 (1) ◽

pp. 2-13

Author(s):

Eiji Uchibe

Keyword(s):

Reinforcement Learning ◽

Markov Decision Processes ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Markov Decision

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303982 ◽

2020 ◽

Author(s):

Yu Wang ◽

Nima Roohi ◽

Matthew West ◽

Mahesh Viswanathan ◽

Geir E. Dullerud

Keyword(s):

Reinforcement Learning ◽

Model Checking ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text