scholarly journals Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Author(s):  
Xiong Wang ◽  
Riheng Jia

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Yaofei Ma ◽  
Xiaole Ma ◽  
Xiao Song

As a continuous state space problem, air combat is difficult to be resolved by traditional dynamic programming (DP) with discretized state space. The approximated dynamic programming (ADP) approach is studied in this paper to build a high performance decision model for air combat in 1 versus 1 scenario, in which the iterative process for policy improvement is replaced by mass sampling from history trajectories and utility function approximating, leading to high efficiency on policy improvement eventually. A continuous reward function is also constructed to better guide the plane to find its way to “winner” state from any initial situation. According to our experiments, the plane is more offensive when following policy derived from ADP approach other than the baseline Min-Max policy, in which the “time to win” is reduced greatly but the cumulated probability of being killed by enemy is higher. The reason is analyzed in this paper.


2020 ◽  
Vol 34 (2) ◽  
Author(s):  
Mikko Lauri ◽  
Joni Pajarinen ◽  
Jan Peters

Abstract Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization.


Symmetry ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 461 ◽  
Author(s):  
David Luviano-Cruz ◽  
Francesco Garcia-Luna ◽  
Luis Pérez-Domínguez ◽  
S. Gadi

A multi-agent system (MAS) is suitable for addressing tasks in a variety of domains without any programmed behaviors, which makes it ideal for the problems associated with the mobile robots. Reinforcement learning (RL) is a successful approach used in the MASs to acquire new behaviors; most of these select exact Q-values in small discrete state space and action space. This article presents a joint Q-function linearly fuzzified for a MAS’ continuous state space, which overcomes the dimensionality problem. Also, this article gives a proof for the convergence and existence of the solution proposed by the algorithm presented. This article also discusses the numerical simulations and experimental results that were carried out to validate the proposed algorithm.


Author(s):  
Klaus Morawetz

The classical non-ideal gas shows that the two original concepts of the pressure based of the motion and the forces have eventually developed into drift and dissipation contributions. Collisions of realistic particles are nonlocal and non-instant. A collision delay characterizes the effective duration of collisions, and three displacements, describe its effective non-locality. Consequently, the scattering integral of kinetic equation is nonlocal and non-instant. The non-instant and nonlocal corrections to the scattering integral directly result in the virial corrections to the equation of state. The interaction of particles via long-range potential tails is approximated by a mean field which acts as an external field. The effect of the mean field on free particles is covered by the momentum drift. The effect of the mean field on the colliding pairs causes the momentum and the energy gains which enter the scattering integral and lead to an internal mechanism of energy conversion. The entropy production is shown and the nonequilibrium hydrodynamic equations are derived. Two concepts of quasiparticle, the spectral and the variational one, are explored with the help of the virial of forces.


2000 ◽  
Vol 61 (17) ◽  
pp. 11521-11528 ◽  
Author(s):  
Sergio A. Cannas ◽  
A. C. N. de Magalhães ◽  
Francisco A. Tamarit

2019 ◽  
Vol 46 (3) ◽  
pp. 54-55
Author(s):  
Thirupathaiah Vasantam ◽  
Arpan Mukhopadhyay ◽  
Ravi R. Mazumdar

2020 ◽  
Vol 31 (1) ◽  
Author(s):  
Hui Huang ◽  
Jinniao Qiu

AbstractIn this paper, we propose and study a stochastic aggregation–diffusion equation of the Keller–Segel (KS) type for modeling the chemotaxis in dimensions $$d=2,3$$ d = 2 , 3 . Unlike the classical deterministic KS system, which only allows for idiosyncratic noises, the stochastic KS equation is derived from an interacting particle system subject to both idiosyncratic and common noises. Both the unique existence of solutions to the stochastic KS equation and the mean-field limit result are addressed.


Sign in / Sign up

Export Citation Format

Share Document