Structured Cooperative Reinforcement Learning with Time-varying Composite Action Space

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

Bellmans Principle of Optimality in Deep Reinforcement Learning for Time-Varying Tasks

International Journal of Control ◽

10.1080/00207179.2021.1913516 ◽

2021 ◽

pp. 1-26

Author(s):

Alessandro Giuseppi ◽

Antonio Pietrabissa

Keyword(s):

Reinforcement Learning ◽

Time Varying ◽

Principle Of Optimality

Download Full-text

On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach

Sensors ◽

10.3390/s20102789 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2789 ◽

Cited By ~ 1

Author(s):

Hang Qi ◽

Hao Huang ◽

Zhiqun Hu ◽

Xiangming Wen ◽

Zhaoming Lu

Keyword(s):

Reinforcement Learning ◽

Transmission Rate ◽

Single Agent ◽

Time Of Day ◽

Action Space ◽

Traffic Load ◽

Traffic Demand ◽

Channel Bonding ◽

On Demand ◽

Multi Agent

In order to meet the ever-increasing traffic demand of Wireless Local Area Networks (WLANs), channel bonding is introduced in IEEE 802.11 standards. Although channel bonding effectively increases the transmission rate, the wider channel reduces the number of non-overlapping channels and is more susceptible to interference. Meanwhile, the traffic load differs from one access point (AP) to another and changes significantly depending on the time of day. Therefore, the primary channel and channel bonding bandwidth should be carefully selected to meet traffic demand and guarantee the performance gain. In this paper, we proposed an On-Demand Channel Bonding (O-DCB) algorithm based on Deep Reinforcement Learning (DRL) for heterogeneous WLANs to reduce transmission delay, where the APs have different channel bonding capabilities. In this problem, the state space is continuous and the action space is discrete. However, the size of action space increases exponentially with the number of APs by using single-agent DRL, which severely affects the learning rate. To accelerate learning, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is used to train O-DCB. Real traffic traces collected from a campus WLAN are used to train and test O-DCB. Simulation results reveal that the proposed algorithm has good convergence and lower delay than other algorithms.

Download Full-text

Cognitive Control Using Adaptive RBF Neural Networks and Reinforcement Learning for Networked Control System Subject to Time-Varying Delay and Packet Losses

Arabian Journal for Science and Engineering ◽

10.1007/s13369-021-05752-y ◽

2021 ◽

Author(s):

Shuti Wang ◽

Xunhe Yin ◽

Peng Li ◽

Yanxin Zhang ◽

Xin Wang ◽

...

Keyword(s):

Neural Networks ◽

Control System ◽

Reinforcement Learning ◽

Cognitive Control ◽

Networked Control System ◽

Time Varying ◽

Rbf Neural Networks ◽

Packet Losses ◽

Time Varying Delay ◽

Varying Delay

Download Full-text

Continuous reinforcement learning based ramp jump control for single-track two-wheeled robots

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211037847 ◽

2021 ◽

pp. 014233122110378

Author(s):

Qingyuan Zheng ◽

Duo Wang ◽

Zhang Chen ◽

Yiyong Sun ◽

Bin Liang

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Control Method ◽

Learning Control ◽

Action Space ◽

Gradient Algorithm ◽

Single Track ◽

Wheeled Robots ◽

Reward Function ◽

Wheeled Robot

Single-track two-wheeled robots have become an important research topic in recent years, owing to their simple structure, energy savings and ability to run on narrow roads. However, the ramp jump remains a challenging task. In this study, we propose to realize a single-track two-wheeled robot ramp jump. We present a control method that employs continuous action reinforcement learning techniques for single-track two-wheeled robot control. We design a novel reward function for reinforcement learning, optimize the dimensions of the action space, and enable training under the deep deterministic policy gradient algorithm. Finally, we validate the control method through simulation experiments and successfully realize the single-track two-wheeled robot ramp jump task. Simulation results validate that the control method is effective and has several advantages over high-dimension action space control, reinforcement learning control of sparse reward function and discrete action reinforcement learning control.

Download Full-text

Constructing continuous action space from basis functions for fast and stable reinforcement learning

RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication ◽

10.1109/roman.2009.5326234 ◽

2009 ◽

Cited By ~ 2

Author(s):

Akihiko Yamaguchi ◽

Jun Takamatsu ◽

Tsukasa Ogasawara

Keyword(s):

Reinforcement Learning ◽

Basis Functions ◽

Action Space ◽

Continuous Action

Download Full-text

Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space

10.1109/cog52621.2021.9619068 ◽

2021 ◽

Author(s):

Zhejie Hu ◽

Tomoyuki Kaneko

Keyword(s):

Reinforcement Learning ◽

Action Space

Download Full-text

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5887 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4577-4584

Author(s):

Xian Yeow Lee ◽

Sambit Ghadai ◽

Kai Liang Tan ◽

Chinmay Hegde ◽

Soumik Sarkar

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Limited Resource ◽

Action Space ◽

Engineering Systems ◽

Physical Systems ◽

Learning Agents ◽

Look Ahead ◽

Real World Applications ◽

Temporal Dimensions

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

Download Full-text

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning

Sensors ◽

10.3390/s20164468 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4468

Author(s):

Ao Xi ◽

Chao Chen

Keyword(s):

Reinforcement Learning ◽

Center Of Pressure ◽

Stable State ◽

Biped Robot ◽

Action Space ◽

Training Procedure ◽

Joint Angles ◽

Model Free ◽

Initial Control ◽

Hybrid Reinforcement

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.

Download Full-text

Autonomous control of real snake-like robot using reinforcement learning; Abstraction of state-action space using properties of real world

2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information ◽

10.1109/issnip.2007.4496875 ◽

2007 ◽

Cited By ~ 5

Author(s):

Kazuyuki Ito ◽

Yoshitaka Fukumori ◽

Akihiro Takayama

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Autonomous Control ◽

Action Space ◽

State Action

Download Full-text