Evolving Equilibrium Policies for a Multiagent Reinforcement Learning Problem with State Attractors

Author(s):  
Florin Leon
Author(s):  
Yuxi Ma ◽  
Meng Shen ◽  
Yuhang Zhao ◽  
Zhao Li ◽  
Xiaoyao Tong ◽  
...  

2020 ◽  
Author(s):  
Felipe Leno Da Silva ◽  
Anna Helena Reali Costa

Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.


Author(s):  
Dómhnall J. Jennings ◽  
Eduardo Alonso ◽  
Esther Mondragón ◽  
Charlotte Bonardi

Standard associative learning theories typically fail to conceptualise the temporal properties of a stimulus, and hence cannot easily make predictions about the effects such properties might have on the magnitude of conditioning phenomena. Despite this, in intuitive terms we might expect that the temporal properties of a stimulus that is paired with some outcome to be important. In particular, there is no previous research addressing the way that fixed or variable duration stimuli can affect overshadowing. In this chapter we report results which show that the degree of overshadowing depends on the distribution form - fixed or variable - of the overshadowing stimulus, and argue that conditioning is weaker under conditions of temporal uncertainty. These results are discussed in terms of models of conditioning and timing. We conclude that the temporal difference model, which has been extensively applied to the reinforcement learning problem in machine learning, accounts for the key findings of our study.


Author(s):  
Jonathan P. How ◽  
Dong-Ki Kim ◽  
Samir Wadhwania

Sign in / Sign up

Export Citation Format

Share Document