scholarly journals Incremental Learning of Planning Actions in Model-Based Reinforcement Learning

Author(s):  
Jun Hao Alvin Ng ◽  
Ronald P. A. Petrick

The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be difficult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning efficiency and goal-directedness of MBRL. We evaluate our work with experimental results for three planning domains.

Author(s):  
Tomohiro Yamaguchi ◽  
Shota Nagahama ◽  
Yoshihiro Ichikawa ◽  
Yoshimichi Honma ◽  
Keiki Takadama

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.


Author(s):  
D. Kruse ◽  
C. Schweers ◽  
A. Trächtler

The paper presents a methodology for a partly automated parameter identification that is to validate multi-domain models. To this end an identification tool under MATLAB has been developed. It enables a partly automated procedure that uses established methods to identify parameters from complex, nonlinear multi-domain models. In order to integrate such multi-domain models into the tool, an interface based on the Functional Mock-up Interface (FMI) standard can be used. The interface makes the required identification parameters from the multi-domain model automatically available to the identification tool. Additionally a guideline is developed which describes the way in which the respective domain expert has to mark the required identification parameters during modeling. The needs for this methodology as well as its application are shown by a practical example from the industry, using Dymola, the FMI-standard, and MATLAB. The practical example deals with the model-based development of a new washing procedure. The paper presents a partly automated parameter identification for the validation of the absorption part of the multi-domain model. Besides, new approaches to the modelling of this kind of absorption effects will be detailed.


Author(s):  
Yankang He ◽  
Di Zhang ◽  
Jinfen Zhang ◽  
Bing Wu ◽  
Carlos Guedes Soares

Abstract The existing ship domain models are mostly based on the navigation behavior of open water vessels, and they are not practicable to directly apply to inland rivers. Therefore, it is necessary to establish an inland ship safety domain model based on the ship traffic characteristic therein. Based on the AIS data in the Yangtze River, this paper establishes the functional relationship between these data through multiple regression analysis using data such as ship spacing, ship length, ship speed, and heading angle. Based on this, the safety distance between ships of different lengths in different situations and other ships is determined, so as to establish a dynamic ship domain model. At the same time, this paper explores the geographical relationship between ship and channel boundary and incorporates it into the ship domain model. Finally, a quantitative approach for ship collision risk is proposed, and the collision threat degree is calculated according to the relative heading of the ship and the position in the dynamic ship domain model. Two case studies, including crossing and overtaking situations, are performed to validate the proposed model.


2020 ◽  
Vol 34 (05) ◽  
pp. 8878-8885
Author(s):  
Haoyu Song ◽  
Wei-Nan Zhang ◽  
Jingwen Hu ◽  
Ting Liu

Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.


2021 ◽  
pp. 1-25
Author(s):  
Wei Pan ◽  
Xin-lian Xie ◽  
Tian-tian Bao ◽  
Meng Li

Abstract Ship domain is an important theory in ship collision avoidance and an effective collision detection method. First, several classical ship domain models are used in experiments. The results show that the alarm rate is too high in busy waters, leading to greatly reduced practicality of the model. Potential collision risk cannot be detected effectively, especially for a ship with restricted manoeuvrability, which is usually regarded as an overtaken ship due to its navigation characteristics. Therefore, it is necessary to fully consider the interference of other ships to ships with limited manoeuvrability in an encounter situation. A novel ship domain model for ships with restricted manoeuvrability in busy waters is proposed. Considering the navigation characteristics of a ship with restricted manoeuvrability and the influence of the ship–ship effect, an algorithm to determine the boundary of the ship domain model is given by force and moment equations. AIS trajectory data of the North Channel of the Yangtze River Estuary are used to perform a comparative experiment, and four classical ship domain models are employed to perform comparative experiments. The results show that the alarm rates of the novel ship domain model are 7⋅608%, 15⋅131%, 55⋅785% and 7⋅608% lower than those of the other four classical models, and this outcome can effectively reduce the high false alarm rate produced by other models in this environment.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2015 ◽  
Vol 23 (21) ◽  
pp. 27376 ◽  
Author(s):  
Mitradeep Sarkar ◽  
Jean-François Bryche ◽  
Julien Moreau ◽  
Mondher Besbes ◽  
Grégory Barbillon ◽  
...  

2020 ◽  
Vol 68 (8) ◽  
pp. 612-624
Author(s):  
Max Pritzkoleit ◽  
Robert Heedt ◽  
Carsten Knoll ◽  
Klaus Röbenack

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.


Sign in / Sign up

Export Citation Format

Share Document