Strategic Negotiation in Multiagent Environments: Sarit Kraus and Jonathan Wilkenfeld

Author(s):  
Takuji Watanabe ◽  
◽  
Kazuteru Miyazaki ◽  
Hiroaki Kobayashi ◽  
◽  
...  

The penalty avoiding rational policy making algorithm (PARP) [1] previously improved to save memory and cope with uncertainty, i.e., IPARP [2], requires that states be discretized in real environments with continuous state spaces, using function approximation or some other method. Especially, in PARP, a method that discretizes state using a basis functions is known [3]. Because this creates a new basis function based on the current input and its next observation, however, an unsuitable basis function may be generated in some asynchronous multiagent environments. We therefore propose a uniform basis function and range extent of the basis function is estimated before learning. We show the effectiveness of our proposal using a soccer game task called “Keepaway.”


2002 ◽  
Vol 14 (2) ◽  
pp. 281-295 ◽  
Author(s):  
A. Nur Zincir-Heywood ◽  
M.I. Heywood ◽  
C.R. Chatwin

2003 ◽  
Vol 19 ◽  
pp. 569-629 ◽  
Author(s):  
B. Price ◽  
C. Boutilier

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


Sign in / Sign up

Export Citation Format

Share Document