Research on autonomous collision avoidance of merchant ship based on inverse reinforcement learning

To learn the optimal collision avoidance policy of merchant ships controlled by human experts, a finite-state Markov decision process model for ship collision avoidance is proposed based on the analysis of collision avoidance mechanism, and an inverse reinforcement learning (IRL) method based on cross entropy and projection is proposed to obtain the optimal policy from expert’s demonstrations. Collision avoidance simulations in different ship encounters are conducted and the results show that the policy obtained by the proposed IRL has a good inversion effect on two kinds of human experts, which indicate that the proposed method can effectively learn the policy of human experts for ship collision avoidance.

Download Full-text

An Overview of Inverse Reinforcement Learning Techniques

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210097 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Giuseppe De Pietro

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Decision Process ◽

Autonomous Agents ◽

Theoretical Background ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Learning Techniques ◽

Markov Decision ◽

Potential Use

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.

Download Full-text

Robust Scheduling based on Daily Activity Learning by using Markov Decision Process and Inverse Reinforcement Learning

KIISE Transactions on Computing Practices ◽

10.5626/ktcp.2017.23.10.599 ◽

2017 ◽

Vol 23 (10) ◽

pp. 599-604

Author(s):

Sang-Woo Lee ◽

Dong-Hyun Kwak ◽

Kyoung-Woon On ◽

Yujung Heo ◽

Wooyoung Kang ◽

...

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Daily Activity ◽

Inverse Reinforcement Learning ◽

Robust Scheduling ◽

Markov Decision ◽

Learning By Using

Download Full-text

A Generic Markov Decision Process Model and Reinforcement Learning Method for Scheduling Agile Earth Observation Satellites

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2020.3020732 ◽

2020 ◽

pp. 1-12

Author(s):

Yongming He ◽

Lining Xing ◽

Yingwu Chen ◽

Witold Pedrycz ◽

Ling Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Process Model ◽

Earth Observation ◽

Learning Method ◽

Markov Decision ◽

Earth Observation Satellites

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Ship Collision Avoidance Utilizing the Cross-Entropy Method for Collision Risk Assessment

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3101007 ◽

2021 ◽

pp. 1-14

Author(s):

Trym Tengesdal ◽

Tor A. Johansen ◽

Edmund F. Brekke

Keyword(s):

Risk Assessment ◽

Collision Avoidance ◽

Entropy Method ◽

Cross Entropy ◽

Collision Risk ◽

Cross Entropy Method ◽

Ship Collision ◽

The Cross

Download Full-text

An IoT based Smart Irrigation Management System using Reinforcement Learning modeled through a Markov Decision Process

10.1109/ds-rt52167.2021.9576130 ◽

2021 ◽

Author(s):

Luis Miguel Samaniego Campoverde ◽

Mauro Tropea ◽

Floriano De Rango

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Management System ◽

Irrigation Management ◽

Markov Decision

Download Full-text

A Fast Markov Decision Process-Based Algorithm for Collision Avoidance in Urban Air Mobility

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2022.3140724 ◽

2022 ◽

pp. 1-14

Author(s):

Josh Bertram ◽

Peng Wei ◽

Joseph Zambreno

Keyword(s):

Collision Avoidance ◽

Markov Decision Process ◽

Decision Process ◽

Urban Air ◽

Markov Decision

Download Full-text

Use of a Markov decision process model for treatment selection in an asymptomatic disease with consideration of risk sensitivity

Socio-Economic Planning Sciences ◽

10.1016/j.seps.2012.09.003 ◽

2013 ◽

Vol 47 (3) ◽

pp. 172-182 ◽

Cited By ~ 5

Author(s):

Vera Tilson ◽

David A. Tilson

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Process Model ◽

Treatment Selection ◽

Risk Sensitivity ◽

Markov Decision ◽

Asymptomatic Disease

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text