Q-Table compression for reinforcement learning

Author(s):  
Leonardo Amado ◽  
Felipe Meneguzzi

AbstractReinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.

Author(s):  
Tianyu Liu ◽  
Zijie Zheng ◽  
Hongchang Li ◽  
Kaigui Bian ◽  
Lingyang Song

Game AI is of great importance as games are simulations of reality. Recent research on game AI has shown much progress in various kinds of games, such as console games, board games and MOBA games. However, the exploration in RTS games remains a challenge for their huge state space, imperfect information, sparse rewards and various strategies. Besides, the typical card-based RTS games have complex card features and are still lacking solutions. We present a deep model SEAT (selection-attention) to play card-based RTS games. The SEAT model includes two parts, a selection part for card choice and an attention part for card usage, and it learns from scratch via deep reinforcement learning. Comprehensive experiments are performed on Clash Royale, a popular mobile card-based RTS game. Empirical results show that the SEAT model agent makes it to reach a high winning rate against rule-based agents and decision-tree-based agent.


Author(s):  
Lin Sun ◽  
Peng Jiao ◽  
Kai Xu ◽  
Quanjun Yin ◽  
Yabing Zha

Real-time strategy (RTS) game has proposed many challenges for AI research for its large state spaces, enormous branch factors, limited decision time and dynamic adversarial environment. To tackle above problems, the method called Adversarial Hierarchical Task Network planning (AHTN) has been proposed and achieves favorable performance. However, the HTN description it used cannot express complex relationships among tasks and impacts of environment on tasks. Moreover, the AHTN cannot handle task failures during plan execution. In this paper, we propose a modified AHTN planning algorithm named AHTNR. The algorithm introduces three elements essential task, phase and exit condition to extend the HTN description. To deal with possible task failures, the AHTNR first uses the extended HTN description to identify failed tasks. And then a novel task repair strategy is proposed based on historical information to maintain the validity of previous plan. Finally, empirical results are presented for the μRTS game, comparing AHTNR to the state-of-the-art search algorithms for RTS games.


Author(s):  
Levi H. S. Lelis

In this paper we introduce Stratified Strategy Selection (SSS), a novel search algorithm for micromanaging units in real-time strategy (RTS) games. SSS uses a type system to partition the player's units into types and assumes that units of the same type must follow the same strategy. SSS searches in the state space induced by the type system to select, from a pool of options, a strategy for each unit. Empirical results on a simulator of an RTS game shows that SSS employing either fixed or adaptive type systems is able to substantially outperform state-of-the-art search-based algorithms in combat scenarios with up to 100 units.


2020 ◽  
Vol 34 (10) ◽  
pp. 13849-13850
Author(s):  
Donghyeon Lee ◽  
Man-Je Kim ◽  
Chang Wook Ahn

In a real-time strategy (RTS) game, StarCraft II, players need to know the consequences before making a decision in combat. We propose a combat outcome predictor which utilizes terrain information as well as squad information. For training the model, we generated a StarCraft II combat dataset by simulating diverse and large-scale combat situations. The overall accuracy of our model was 89.7%. Our predictor can be integrated into the artificial intelligence agent for RTS games as a short-term decision-making module.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3864
Author(s):  
Tarek Ghoul ◽  
Tarek Sayed

Speed advisories are used on highways to inform vehicles of upcoming changes in traffic conditions and apply a variable speed limit to reduce traffic conflicts and delays. This study applies a similar concept to intersections with respect to connected vehicles to provide dynamic speed advisories in real-time that guide vehicles towards an optimum speed. Real-time safety evaluation models for signalized intersections that depend on dynamic traffic parameters such as traffic volume and shock wave characteristics were used for this purpose. The proposed algorithm incorporates a rule-based approach alongside a Deep Deterministic Policy Gradient reinforcement learning technique (DDPG) to assign ideal speeds for connected vehicles at intersections and improve safety. The system was tested on two intersections using real-world data and yielded an average reduction in traffic conflicts ranging from 9% to 23%. Further analysis was performed to show that the algorithm yields tangible results even at lower market penetration rates (MPR). The algorithm was tested on the same intersection with different traffic volume conditions as well as on another intersection with different physical constraints and characteristics. The proposed algorithm provides a low-cost approach that is not computationally intensive and works towards optimizing for safety by reducing rear-end traffic conflicts.


2021 ◽  
Vol 3 (6) ◽  
Author(s):  
Ogbonnaya Anicho ◽  
Philip B. Charlesworth ◽  
Gurvinder S. Baicher ◽  
Atulya K. Nagar

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.


Sign in / Sign up

Export Citation Format

Share Document