Incremental Learning of Planning Actions in Model-Based Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/443 ◽

2019 ◽

Author(s):

Jun Hao Alvin Ng ◽

Ronald P. A. Petrick

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Incremental Learning ◽

Domain Model ◽

Experimental Results ◽

The Novel ◽

Model Based ◽

Domain Models ◽

Novel Concept ◽

Planning Problems

The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be difficult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning efficiency and goal-directedness of MBRL. We evaluate our work with experimental results for three planning domains.

Download Full-text

Model-Based Multi-Objective Reinforcement Learning by a Reward Occurrence Probability Vector

Advanced Robotics and Intelligent Automation in Manufacturing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-1382-8.ch010 ◽

2020 ◽

pp. 269-295

Author(s):

Tomohiro Yamaguchi ◽

Shota Nagahama ◽

Yoshihiro Ichikawa ◽

Yoshimichi Honma ◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Experimental Results ◽

Previous Model ◽

Pareto Optimal ◽

Occurrence Probability ◽

Contrast Model ◽

Multi Objective ◽

Model Based ◽

Model Free ◽

Optimal Policies

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.

Download Full-text

Methodology for a Partly Automated Parameter Identification for the Validation of Multi-Domain Models

Volume 11: Systems, Design, and Complexity ◽

10.1115/imece2014-37041 ◽

2014 ◽

Author(s):

D. Kruse ◽

C. Schweers ◽

A. Trächtler

Keyword(s):

Parameter Identification ◽

Domain Model ◽

Domain Expert ◽

Model Based ◽

New Approaches ◽

Domain Models ◽

Identification Tool ◽

The Way

The paper presents a methodology for a partly automated parameter identification that is to validate multi-domain models. To this end an identification tool under MATLAB has been developed. It enables a partly automated procedure that uses established methods to identify parameters from complex, nonlinear multi-domain models. In order to integrate such multi-domain models into the tool, an interface based on the Functional Mock-up Interface (FMI) standard can be used. The interface makes the required identification parameters from the multi-domain model automatically available to the identification tool. Additionally a guideline is developed which describes the way in which the respective domain expert has to mark the required identification parameters during modeling. The needs for this methodology as well as its application are shown by a practical example from the industry, using Dymola, the FMI-standard, and MATLAB. The practical example deals with the model-based development of a new washing procedure. The paper presents a partly automated parameter identification for the validation of the absorption part of the multi-domain model. Besides, new approaches to the modelling of this kind of absorption effects will be detailed.

Download Full-text

Dynamic Ship Domain Model Based on AIS Data for Inland Waterways

Volume 2A: Structures, Safety, and Reliability ◽

10.1115/omae2020-18700 ◽

2020 ◽

Author(s):

Yankang He ◽

Di Zhang ◽

Jinfen Zhang ◽

Bing Wu ◽

Carlos Guedes Soares

Keyword(s):

Open Water ◽

Domain Model ◽

Inland Waterways ◽

Ship Traffic ◽

Model Based ◽

Proposed Model ◽

Channel Boundary ◽

Domain Models ◽

Inland Rivers ◽

Using Data

Abstract The existing ship domain models are mostly based on the navigation behavior of open water vessels, and they are not practicable to directly apply to inland rivers. Therefore, it is necessary to establish an inland ship safety domain model based on the ship traffic characteristic therein. Based on the AIS data in the Yangtze River, this paper establishes the functional relationship between these data through multiple regression analysis using data such as ship spacing, ship length, ship speed, and heading angle. Based on this, the safety distance between ships of different lengths in different situations and other ships is determined, so as to establish a dynamic ship domain model. At the same time, this paper explores the geographical relationship between ship and channel boundary and incorporates it into the ship domain model. Finally, a quantitative approach for ship collision risk is proposed, and the collision threat degree is calculated according to the relative heading of the ship and the position in the dynamic ship domain model. Two case studies, including crossing and overtaking situations, are performed to validate the proposed model.

Download Full-text

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6417 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8878-8885

Author(s):

Haoyu Song ◽

Wei-Nan Zhang ◽

Jingwen Hu ◽

Ting Liu

Keyword(s):

Reinforcement Learning ◽

Natural Language ◽

Experimental Results ◽

Learning Problem ◽

Model Based ◽

Consistency Evaluation

Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.

Download Full-text

Ship domain model for ships with restricted manoeuvrability in busy waters

Journal of Navigation ◽

10.1017/s037346332000065x ◽

2021 ◽

pp. 1-25

Author(s):

Wei Pan ◽

Xin-lian Xie ◽

Tian-tian Bao ◽

Meng Li

Keyword(s):

River Estuary ◽

Model Potential ◽

Domain Model ◽

Yangtze River Estuary ◽

The Novel ◽

Trajectory Data ◽

The North ◽

The Yangtze River Estuary ◽

Classical Models ◽

Domain Models

Abstract Ship domain is an important theory in ship collision avoidance and an effective collision detection method. First, several classical ship domain models are used in experiments. The results show that the alarm rate is too high in busy waters, leading to greatly reduced practicality of the model. Potential collision risk cannot be detected effectively, especially for a ship with restricted manoeuvrability, which is usually regarded as an overtaken ship due to its navigation characteristics. Therefore, it is necessary to fully consider the interference of other ships to ships with limited manoeuvrability in an encounter situation. A novel ship domain model for ships with restricted manoeuvrability in busy waters is proposed. Considering the navigation characteristics of a ship with restricted manoeuvrability and the influence of the ship–ship effect, an algorithm to determine the boundary of the ship domain model is given by force and moment equations. AIS trajectory data of the North Channel of the Yangtze River Estuary are used to perform a comparative experiment, and four classical ship domain models are employed to perform comparative experiments. The results show that the alarm rates of the novel ship domain model are 7⋅608%, 15⋅131%, 55⋅785% and 7⋅608% lower than those of the other four classical models, and this outcome can effectively reduce the high false alarm rate produced by other models in this environment.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Generalized analytical model based on harmonic coupling for hybrid plasmonic modes: comparison with numerical and experimental results

Optics Express ◽

10.1364/oe.23.027376 ◽

2015 ◽

Vol 23 (21) ◽

pp. 27376 ◽

Cited By ~ 8

Author(s):

Mitradeep Sarkar ◽

Jean-François Bryche ◽

Julien Moreau ◽

Mondher Besbes ◽

Grégory Barbillon ◽

...

Keyword(s):

Analytical Model ◽

Experimental Results ◽

Model Based

Download Full-text

Bestärkendes Lernen mittels Offline-Trajektorienplanung basierend auf iterativ approximierten Modellen

at - Automatisierungstechnik ◽

10.1515/auto-2020-0024 ◽

2020 ◽

Vol 68 (8) ◽

pp. 612-624

Author(s):

Max Pritzkoleit ◽

Robert Heedt ◽

Carsten Knoll ◽

Klaus Röbenack

Keyword(s):

Reinforcement Learning ◽

Neuronale Netze ◽

Model Based ◽

Künstliche Neuronale Netze

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.

Download Full-text