Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/429 ◽

2021 ◽

Author(s):

Xiong Wang ◽

Riheng Jia

Keyword(s):

Contraction Mapping ◽

Mean Field ◽

Markov Analysis ◽

Discrete State ◽

Reward Function ◽

Continuous State ◽

State Evolution ◽

Continuous Reward ◽

Multi Agent ◽

The Mean

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

Download Full-text

A Case Study on Air Combat Decision Using Approximated Dynamic Programming

Mathematical Problems in Engineering ◽

10.1155/2014/183401 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 8

Author(s):

Yaofei Ma ◽

Xiaole Ma ◽

Xiao Song

Keyword(s):

Dynamic Programming ◽

State Space ◽

High Performance ◽

High Efficiency ◽

Policy Improvement ◽

Space Problem ◽

Reward Function ◽

Continuous State ◽

Air Combat ◽

Continuous Reward

As a continuous state space problem, air combat is difficult to be resolved by traditional dynamic programming (DP) with discretized state space. The approximated dynamic programming (ADP) approach is studied in this paper to build a high performance decision model for air combat in 1 versus 1 scenario, in which the iterative process for policy improvement is replaced by mass sampling from history trajectories and utility function approximating, leading to high efficiency on policy improvement eventually. A continuous reward function is also constructed to better guide the plane to find its way to “winner” state from any initial situation. According to our experiments, the plane is more offensive when following policy derived from ADP approach other than the baseline Min-Max policy, in which the “time to win” is reduced greatly but the cumulated probability of being killed by enemy is higher. The reason is analyzed in this paper.

Download Full-text

Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-020-09467-6 ◽

2020 ◽

Vol 34 (2) ◽

Author(s):

Mikko Lauri ◽

Joni Pajarinen ◽

Jan Peters

Keyword(s):

Particle Filtering ◽

Autonomous Agents ◽

Autonomous Underwater Vehicles ◽

Optimal Solution ◽

Information Gathering ◽

Reward Function ◽

Continuous State ◽

Order Of Magnitude ◽

Multi Agent ◽

Reward Functions

Abstract Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest when constant communication cannot be assumed. This is common in tasks involving information gathering with multiple independently operating sensor devices that may operate over large physical distances, such as unmanned aerial vehicles, or in communication limited environments such as in the case of autonomous underwater vehicles. In this paper, we frame the information gathering task as a general decentralized partially observable Markov decision process (Dec-POMDP). The Dec-POMDP is a principled model for co-operative decentralized multi-agent decision-making. An optimal solution of a Dec-POMDP is a set of local policies, one for each agent, which maximizes the expected sum of rewards over time. In contrast to most prior work on Dec-POMDPs, we set the reward as a non-linear function of the agents’ state information, for example the negative Shannon entropy. We argue that such reward functions are well-suited for decentralized information gathering problems. We prove that if the reward function is convex, then the finite-horizon value function of the Dec-POMDP is also convex. We propose the first heuristic anytime algorithm for information gathering Dec-POMDPs, and empirically prove its effectiveness by solving discrete problems an order of magnitude larger than previous state-of-the-art. We also propose an extension to continuous-state problems with finite action and observation spaces by employing particle filtering. The effectiveness of the proposed algorithms is verified in domains such as decentralized target tracking, scientific survey planning, and signal source localization.

Download Full-text

Multi-Agent Reinforcement Learning Using Linear Fuzzy Model Applied to Cooperative Mobile Robots

Symmetry ◽

10.3390/sym10100461 ◽

2018 ◽

Vol 10 (10) ◽

pp. 461 ◽

Cited By ~ 7

Author(s):

David Luviano-Cruz ◽

Francesco Garcia-Luna ◽

Luis Pérez-Domínguez ◽

S. Gadi

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

State Space ◽

Fuzzy Model ◽

Discrete State ◽

Continuous State Space ◽

Continuous State ◽

Multi Agent ◽

Successful Approach ◽

Q Function

A multi-agent system (MAS) is suitable for addressing tasks in a variety of domains without any programmed behaviors, which makes it ideal for the problems associated with the mobile robots. Reinforcement learning (RL) is a successful approach used in the MASs to acquire new behaviors; most of these select exact Q-values in small discrete state space and action space. This article presents a joint Q-function linearly fuzzified for a MAS’ continuous state space, which overcomes the dimensionality problem. Also, this article gives a proof for the convergence and existence of the solution proposed by the algorithm presented. This article also discusses the numerical simulations and experimental results that were carried out to validate the proposed algorithm.

Download Full-text

On the mean-field theory of the Karlsruhe dynamo experiment I. Kinematic theory

Magnetohydrodynamics ◽

10.22364/mhd.38.1-2.6 ◽

2002 ◽

Vol 38 (1-2) ◽

pp. 41-71 ◽

Cited By ~ 16

Keyword(s):

Field Theory ◽

Mean Field Theory ◽

Mean Field ◽

Kinematic Theory ◽

The Mean

Download Full-text

Path Integral Method in the Mean-field Model for the Magnetic Vector Potential

Geomagnetism and Aeronomy ◽

10.1134/s0016793220070300 ◽

2020 ◽

Vol 60 (7) ◽

pp. 989-992

Author(s):

E. V. Yushkov ◽

C. R. Kamaletdinov ◽

D. D. Sokoloff

Keyword(s):

Path Integral ◽

Vector Potential ◽

Integral Method ◽

Field Model ◽

Mean Field ◽

Magnetic Vector Potential ◽

Magnetic Vector ◽

Mean Field Model ◽

The Mean ◽

Path Integral Method

Download Full-text

Classical Kinetic Theory

10.1093/oso/9780198797241.003.0003 ◽

2018 ◽

Author(s):

Klaus Morawetz

Keyword(s):

Kinetic Equation ◽

Equation Of State ◽

Mean Field ◽

Ideal Gas ◽

Hydrodynamic Equations ◽

Free Particles ◽

Internal Mechanism ◽

The Mean ◽

Energy Gains ◽

Non Locality

The classical non-ideal gas shows that the two original concepts of the pressure based of the motion and the forces have eventually developed into drift and dissipation contributions. Collisions of realistic particles are nonlocal and non-instant. A collision delay characterizes the effective duration of collisions, and three displacements, describe its effective non-locality. Consequently, the scattering integral of kinetic equation is nonlocal and non-instant. The non-instant and nonlocal corrections to the scattering integral directly result in the virial corrections to the equation of state. The interaction of particles via long-range potential tails is approximated by a mean field which acts as an external field. The effect of the mean field on free particles is covered by the momentum drift. The effect of the mean field on the colliding pairs causes the momentum and the energy gains which enter the scattering integral and lead to an internal mechanism of energy conversion. The entropy production is shown and the nonequilibrium hydrodynamic equations are derived. Two concepts of quasiparticle, the spectral and the variational one, are explored with the help of the virial of forces.

Download Full-text

Evidence of exactness of the mean-field theory in the nonextensive regime of long-range classical spin models

Physical Review B ◽

10.1103/physrevb.61.11521 ◽

2000 ◽

Vol 61 (17) ◽

pp. 11521-11528 ◽

Cited By ~ 36

Author(s):

Sergio A. Cannas ◽

A. C. N. de Magalhães ◽

Francisco A. Tamarit

Keyword(s):

Field Theory ◽

Long Range ◽

Mean Field Theory ◽

Mean Field ◽

Spin Models ◽

Classical Spin ◽

The Mean ◽

Classical Spin Models

Download Full-text

The Mean-field Behavior of Processor Sharing Systems with General Job Lengths Under the SQ(d) Policy

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3308897.3308924 ◽

2019 ◽

Vol 46 (3) ◽

pp. 54-55

Author(s):

Thirupathaiah Vasantam ◽

Arpan Mukhopadhyay ◽

Ravi R. Mazumdar

Keyword(s):

Mean Field ◽

Processor Sharing ◽

Field Behavior ◽

The Mean

Download Full-text

On the mean field limit of the Random Batch Method for interacting particle systems

Science China Mathematics ◽

10.1007/s11425-020-1810-6 ◽

2021 ◽

Cited By ~ 1

Author(s):

Shi Jin ◽

Lei Li

Keyword(s):

Interacting Particle Systems ◽

Mean Field ◽

Particle Systems ◽

Batch Method ◽

Field Limit ◽

Mean Field Limit ◽

Interacting Particle ◽

The Mean

Download Full-text

The Microscopic Derivation and Well-Posedness of the Stochastic Keller–Segel Equation

Journal of Nonlinear Science ◽

10.1007/s00332-020-09661-6 ◽

2020 ◽

Vol 31 (1) ◽

Author(s):

Hui Huang ◽

Jinniao Qiu

Keyword(s):

Existence Of Solutions ◽

Particle System ◽

Mean Field ◽

Field Limit ◽

Mean Field Limit ◽

Unique Existence ◽

Interacting Particle ◽

Microscopic Derivation ◽

The Mean ◽

Limit Result

AbstractIn this paper, we propose and study a stochastic aggregation–diffusion equation of the Keller–Segel (KS) type for modeling the chemotaxis in dimensions $$d=2,3$$ d = 2 , 3 . Unlike the classical deterministic KS system, which only allows for idiosyncratic noises, the stochastic KS equation is derived from an interacting particle system subject to both idiosyncratic and common noises. Both the unique existence of solutions to the stochastic KS equation and the mean-field limit result are addressed.

Download Full-text