Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

Author(s):  
Erkin Cilden ◽  
Faruk Polat
2007 ◽  
Vol 19 (11) ◽  
pp. 3051-3087 ◽  
Author(s):  
Hajime Fujita ◽  
Shin Ishii

Games constitute a challenging domain of reinforcement learning (RL) for acquiring strategies because many of them include multiple players and many unobservable variables in a large state space. The difficulty of solving such realistic multiagent problems with partial observability arises mainly from the fact that the computational cost for the estimation and prediction in the whole state space, including unobservable variables, is too heavy. To overcome this intractability and enable an agent to learn in an unknown environment, an effective approximation method is required with explicit learning of the environmental model. We present a model-based RL scheme for large-scale multiagent problems with partial observability and apply it to a card game, hearts. This game is a well-defined example of an imperfect information game and can be approximately formulated as a partially observable Markov decision process (POMDP) for a single learning agent. To reduce the computational cost, we use a sampling technique in which the heavy integration required for the estimation and prediction can be approximated by a plausible number of samples. Computer simulation results show that our method is effective in solving such a difficult, partially observable multiagent problem.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2020 ◽  
Vol 68 (8) ◽  
pp. 612-624
Author(s):  
Max Pritzkoleit ◽  
Robert Heedt ◽  
Carsten Knoll ◽  
Klaus Röbenack

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.


Author(s):  
Cheng-Yu Kuo ◽  
Andreas Schaarschmidt ◽  
Yunduan Cui ◽  
Tamim Asfour ◽  
Takamitsu Matsubara

2021 ◽  
Author(s):  
Wenjie Shang ◽  
Qingyang Li ◽  
Zhiwei Qin ◽  
Yang Yu ◽  
Yiping Meng ◽  
...  

2014 ◽  
Vol 513-517 ◽  
pp. 1092-1095
Author(s):  
Bo Wu ◽  
Yan Peng Feng ◽  
Hong Yan Zheng

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.


Sign in / Sign up

Export Citation Format

Share Document