Dynamic multiobjective optimization driven by inverse reinforcement learning

Ensemble Inverse Reinforcement Learning from Semi-Expert Agents

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.137.667 ◽

2017 ◽

Vol 137 (4) ◽

pp. 667-673

Author(s):

Shinji Tomita ◽

Fumiya Hamatsu ◽

Tomoki Hamagami

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning

Download Full-text

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/891 ◽

2019 ◽

Author(s):

Ritesh Noothigattu ◽

Djallel Bouneffouf ◽

Nicholas Mattei ◽

Rachita Chandra ◽

Piyush Madan ◽

...

Keyword(s):

Reinforcement Learning ◽

Ethical Values ◽

Large Role ◽

Learning To Learn ◽

Inverse Reinforcement Learning ◽

Time Step ◽

Novel Approach

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms

2020 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci47803.2020.9308391 ◽

2020 ◽

Author(s):

Aaron J. Snoswell ◽

Surya P. N. Singh ◽

Nan Ye

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Inverse Reinforcement Learning

Download Full-text

Scalable Inverse Reinforcement Learning Through Multifidelity Bayesian Optimization

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3051012 ◽

2021 ◽

pp. 1-8

Author(s):

Mahdi Imani ◽

Seyede Fatemeh Ghoreishi

Keyword(s):

Reinforcement Learning ◽

Bayesian Optimization ◽

Inverse Reinforcement Learning

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text

Deep Inverse Reinforcement Learning for Behavior Prediction in Autonomous Driving: Accurate Forecasts of Vehicle Motion

IEEE Signal Processing Magazine ◽

10.1109/msp.2020.2988287 ◽

2021 ◽

Vol 38 (1) ◽

pp. 87-96

Author(s):

Tharindu Fernando ◽

Simon Denman ◽

Sridha Sridharan ◽

Clinton Fookes

Keyword(s):

Reinforcement Learning ◽

Autonomous Driving ◽

Behavior Prediction ◽

Inverse Reinforcement Learning ◽

Vehicle Motion

Download Full-text

Travel Time-Dependent Maximum Entropy Inverse Reinforcement Learning for Seabird Trajectory Prediction

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) ◽

10.1109/acpr.2017.20 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tsubasa Hirakawa ◽

Takayoshi Yamashita ◽

Ken Yoda ◽

Toru Tamaki ◽

Hironobu Fujiyoshi

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Maximum Entropy ◽

Time Dependent ◽

Trajectory Prediction ◽

Inverse Reinforcement Learning

Download Full-text

Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals

2011 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2011.6048804 ◽

2011 ◽

Cited By ~ 1

Author(s):

N. Aghasadeghi ◽

T. Bretl

Keyword(s):

Reinforcement Learning ◽

Maximum Entropy ◽

Path Integrals ◽

Inverse Reinforcement Learning ◽

State Spaces ◽

Continuous State

Download Full-text