Multiagent reinforcement learning with the partly high-dimensional state space

2006 ◽  
Vol 37 (9) ◽  
pp. 22-31 ◽  
Author(s):  
Kazuyuki Fujita ◽  
Hiroshi Matsuo
2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Wei Dai ◽  
Wei Wang ◽  
Zhongtian Mao ◽  
Ruwen Jiang ◽  
Fudong Nian ◽  
...  

The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constraint terms. In this problem, all agents share the space of states and actions, but each agent only obtains its own local reward. Then, we propose a distributed optimization with fractional order dynamics to solve this problem. Moreover, we prove the convergence of the proposed algorithm and illustrate its effectiveness with a numerical example.


2010 ◽  
Vol 30 (2) ◽  
pp. 192-215 ◽  
Author(s):  
Alexander Shkolnik ◽  
Michael Levashov ◽  
Ian R. Manchester ◽  
Russ Tedrake

A motion planning algorithm is described for bounding over rough terrain with the LittleDog robot. Unlike walking gaits, bounding is highly dynamic and cannot be planned with quasi-steady approximations. LittleDog is modeled as a planar five-link system, with a 16-dimensional state space; computing a plan over rough terrain in this high-dimensional state space that respects the kinodynamic constraints due to underactuation and motor limits is extremely challenging. Rapidly Exploring Random Trees (RRTs) are known for fast kinematic path planning in high-dimensional configuration spaces in the presence of obstacles, but search efficiency degrades rapidly with the addition of challenging dynamics. A computationally tractable planner for bounding was developed by modifying the RRT algorithm by using: (1) motion primitives to reduce the dimensionality of the problem; (2) Reachability Guidance, which dynamically changes the sampling distribution and distance metric to address differential constraints and discontinuous motion primitive dynamics; and (3) sampling with a Voronoi bias in a lower-dimensional “task space” for bounding. Short trajectories were demonstrated to work on the robot, however open-loop bounding is inherently unstable. A feedback controller based on transverse linearization was implemented, and shown in simulation to stabilize perturbations in the presence of noise and time delays.


2011 ◽  
Vol 11 (3&4) ◽  
pp. 313-325
Author(s):  
Warner A. Miller

An increase in the dimension of state space for quantum key distribution (QKD) can decrease its fidelity requirements while also increasing its bandwidth. A significant obstacle for QKD with qu$d$its ($d\geq 3$) has been an efficient and practical quantum state sorter for photons whose complex fields are modulated in both amplitude and phase. We propose such a sorter based on a multiplexed thick hologram, constructed e.g. from photo-thermal refractive (PTR) glass. We validate this approach using coupled-mode theory with parameters consistent with PTR glass to simulate a holographic sorter. The model assumes a three-dimensional state space spanned by three tilted planewaves. The utility of such a sorter for broader quantum information processing applications can be substantial.


2016 ◽  
Vol 11 (3) ◽  
pp. 350-374 ◽  
Author(s):  
Chris Westbury

There is a distinction in scientific explanation between the explanandum, statements describing the empirical phenomenon to be explained, and the explanans, statements describing the evidence that allow one to predict that phenomenon. To avoid tautology, these sets of statements must refer to distinct domains. A scientific explanation of semantics must be grounded in explanans that appeal to entities from non-semantic domains. I consider as examples eight candidate domains (including affect, lexical or sub-word co-occurrence, mental simulation, and associative learning) that could ground semantics. Following Wittgenstein (1954), I propose adjudicating between these different domains is difficult because of the reification of a word’s ‘meaning’ as an atomistic unit. If we abandon the idea of the meaning of a word as being an atomistic unit and instead think of word meaning as a set of dynamic and disparate embodied states unified by a shared label, many apparent problems associated with identifying a meaning’s ‘true’ explanans disappear. Semantics can be considered as sets of weighted constraints that are individually sufficient for specifying and labeling a subjectively-recognizable location in the high dimensional state space defined by our neural activity.


2011 ◽  
Author(s):  
Robert W. Boyd ◽  
Anand Jha ◽  
Mehul Malik ◽  
Colin O'Sullivan ◽  
Brandon Rodenburg ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Baolai Wang ◽  
Shengang Li ◽  
Xianzhong Gao ◽  
Tao Xie

With the development of unmanned aerial vehicle (UAV) technology, UAV swarm confrontation has attracted many researchers’ attention. However, the situation faced by the UAV swarm has substantial uncertainty and dynamic variability. The state space and action space increase exponentially with the number of UAVs, so that autonomous decision-making becomes a difficult problem in the confrontation environment. In this paper, a multiagent reinforcement learning method with macro action and human expertise is proposed for autonomous decision-making of UAVs. In the proposed approach, UAV swarm is modeled as a large multiagent system (MAS) with an individual UAV as an agent, and the sequential decision-making problem in swarm confrontation is modeled as a Markov decision process. Agents in the proposed method are trained based on the macro actions, where sparse and delayed rewards, large state space, and action space are effectively overcome. The key to the success of this method is the generation of the macro actions that allow the high-level policy to find a near-optimal solution. In this paper, we further leverage human expertise to design a set of good macro actions. Extensive empirical experiments in our constructed swarm confrontation environment show that our method performs better than the other algorithms.


Sign in / Sign up

Export Citation Format

Share Document