Complex Dynamics of Single Agent Choice Governed by Dual-Channel Multi-Mode Reinforcement Learning

Author(s):  
Ihor Lubashevsky ◽  
Arkady Zgonnikov ◽  
Sergey Maslov ◽  
Namik Goussein-zade
2021 ◽  
Vol 11 (11) ◽  
pp. 4948
Author(s):  
Lorenzo Canese ◽  
Gian Carlo Cardarilli ◽  
Luca Di Di Nunzio ◽  
Rocco Fazzolari ◽  
Daniele Giardino ◽  
...  

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.


Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2789 ◽  
Author(s):  
Hang Qi ◽  
Hao Huang ◽  
Zhiqun Hu ◽  
Xiangming Wen ◽  
Zhaoming Lu

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.


2015 ◽  
Vol 2015 ◽  
pp. 1-16
Author(s):  
Chao Lu ◽  
Yanan Zhao ◽  
Jianwei Gong

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.


2005 ◽  
Vol 15 (01n02) ◽  
pp. 151-162 ◽  
Author(s):  
DEHU QI ◽  
RON SUN

A cooperative team of agents may perform many tasks better than single agents. The question is how cooperation among self-interested agents should be achieved. It is important that, while we encourage cooperation among agents in a team, we maintain autonomy of individual agents as much as possible, so as to maintain flexibility and generality. This paper presents an approach based on bidding utilizing reinforcement values acquired through reinforcement learning. We tested and analyzed this approach and demonstrated that a team indeed performed better than the best single agent as well as the average of single agents.


Author(s):  
Daxue Liu ◽  
Jun Wu ◽  
Xin Xu

Multi-agent reinforcement learning (MARL) provides a useful and flexible framework for multi-agent coordination in uncertain dynamic environments. However, the generalization ability and scalability of algorithms to large problem sizes, already problematic in single-agent RL, is an even more formidable obstacle in MARL applications. In this paper, a new MARL method based on ordinal action selection and approximate policy iteration called OAPI (Ordinal Approximate Policy Iteration), is presented to address the scalability issue of MARL algorithms in common-interest Markov Games. In OAPI, an ordinal action selection and learning strategy is integrated with distributed approximate policy iteration not only to simplify the policy space and eliminate the conflicts in multi-agent coordination, but also to realize the approximation of near-optimal policies for Markov Games with large state spaces. Based on the simplified policy space using ordinal action selection, the OAPI algorithm implements distributed approximate policy iteration utilizing online least-squares policy iteration (LSPI). This resulted in multi-agent coordination with good convergence properties with reduced computational complexity. The simulation results of a coordinated multi-robot navigation task illustrate the feasibility and effectiveness of the proposed approach.


Author(s):  
Kazuteru Miyazaki ◽  
◽  
Keiki Takadama ◽  

Recently, the tailor-made system that grants an individual request has been recognized as the important approach. Such a system requires the ggoal-directed learningh through interaction between user and system, which is mainly addressed in greinforcement learningh domain. This special issue on gNew Trends in Reinforcement Learningh called for papers on the cuttingedge research exploring the goal-directed learning, which represents reinforcement learning. Many contributions were forthcoming, but we finally selected 12 works for publication. Although greinforcement learningh is included in the title of this special issue, the research works do not necessarily have to be on reinforcement learning itself, so long as the theme coincides with that of this special issue. In making our final selections, we gave special consideration to the kinds of research which can actively lead to new trends in reinforcement learning. Of the 12 papers in this special issue, the first four mainly deal with the expansion of the reinforcement learning method in single agent environments. These cover a broad range of research, from works based on dynamic programming to exploitation-oriented methods. The next two works deal with the Learning Classifier System (LCS), which applies the rule discovery mechanism to reinforcement learning. LCS is a technique with a long history, but for this issue, we were able to publish two theoretical works. We are also grateful to Prof. Toshio Fukuda, Nagoya University, and Prof. Kaoru Hirota, Tokyo Institute of Technology, the editors-in-chief, and the NASTEC 2008 conference staff for inviting us to guest-edit this Journal. The next four papers mainly deal with multi agent environments. We were able to draw from a wide range of research: from measuring interaction, through the expansion of techniques incorporating simultaneous learning, to research leading to application in multi agent environments. The last two contributions mainly deal with application. We publish one paper on exemplar generalization and another detailing the successful application to government bond trading. Each of these researches can be considered to be at the cutting-edge of reinforcement learning. We would like to end by saying that we hope this special issue constitutes a large contribution to the development of the field while holding a wide international appeal.


2016 ◽  
Vol 31 (1) ◽  
pp. 44-58 ◽  
Author(s):  
Sam Devlin ◽  
Daniel Kudenko

AbstractRecent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.


2020 ◽  
Vol 1 ◽  
pp. 6
Author(s):  
Alexandra Vedeler ◽  
Narada Warakagoda

The task of obstacle avoidance using maritime vessels, such as Unmanned Surface Vehicles (USV), has traditionally been solved using specialized modules that are designed and optimized separately. However, this approach requires a deep insight into the environment, the vessel, and their complex dynamics. We propose an alternative method using Imitation Learning (IL) through Deep Reinforcement Learning (RL) and Deep Inverse Reinforcement Learning (IRL) and present a system that learns an end-to-end steering model capable of mapping radar-like images directly to steering actions in an obstacle avoidance scenario. The USV used in the work is equipped with a Radar sensor and we studied the problem of generating a single action parameter, heading. We apply an IL algorithm known as generative adversarial imitation learning (GAIL) to develop an end-to-end steering model for a scenario where avoidance of an obstacle is the goal. The performance of the system was studied for different design choices and compared to that of a system that is based on pure RL. The IL system produces results that indicate it is able to grasp the concept of the task and that in many ways are on par with the RL system. We deem this to be promising for future use in tasks that are not as easily described by a reward function.  


Sign in / Sign up

Export Citation Format

Share Document