scholarly journals Algorithms or Actions? A Study in Large-Scale Reinforcement Learning

Author(s):  
Anderson Rocha Tavares ◽  
Sivasubramanian Anbalagan ◽  
Leandro Soriano Marcolino ◽  
Luiz Chaimowicz

Large state and action spaces are very challenging to reinforcement learning. However, in many domains there is a set of algorithms available, which estimate the best action given a state. Hence, agents can either directly learn a performance-maximizing mapping from states to actions, or from states to algorithms. We investigate several aspects of this dilemma, showing sufficient conditions for learning over algorithms to outperform over actions for a finite number of training iterations. We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games.

2013 ◽  
Vol 2013 ◽  
pp. 1-10
Author(s):  
Víctor Uc-Cetina

We introduce a reinforcement learning architecture designed for problems with an infinite number of states, where each state can be seen as a vector of real numbers and with a finite number of actions, where each action requires a vector of real numbers as parameters. The main objective of this architecture is to distribute in two actors the work required to learn the final policy. One actor decides what action must be performed; meanwhile, a second actor determines the right parameters for the selected action. We tested our architecture and one algorithm based on it solving the robot dribbling problem, a challenging robot control problem taken from the RoboCup competitions. Our experimental work with three different function approximators provides enough evidence to prove that the proposed architecture can be used to implement fast, robust, and reliable reinforcement learning algorithms.


Author(s):  
Levi H. S. Lelis

In this paper we review several planning algorithms developed for zero-sum games with exponential action spaces, i.e., spaces that grow exponentially with the number of game components that can act simultaneously at a given game state. As an example, real-time strategy games have exponential action spaces because the number of actions available grows exponentially with the number of units controlled by the player. We also present a unifying perspective in which several existing algorithms can be described as an instantiation of a variant of NaiveMCTS. In addition to describing several existing planning algorithms for exponential action spaces, we show that other instantiations of this variant of NaiveMCTS represent novel and promising algorithms to be studied in future works.


2015 ◽  
Vol 54 ◽  
pp. 257-264 ◽  
Author(s):  
Harshit Sethy ◽  
Amit Patel ◽  
Vineet Padmanabhan

2009 ◽  
Vol 23 (9) ◽  
pp. 855-871 ◽  
Author(s):  
Kresten Toftgaard Andersen ◽  
Yifeng Zeng ◽  
Dennis Dahl Christensen ◽  
Dung Tran

2020 ◽  
Vol 34 (04) ◽  
pp. 6672-6679 ◽  
Author(s):  
Deheng Ye ◽  
Zhao Liu ◽  
Mingfei Sun ◽  
Bei Shi ◽  
Peilin Zhao ◽  
...  

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games.


2021 ◽  
Author(s):  
Nanda Kishore Sreenivas ◽  
Shrisha Rao

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.


Sign in / Sign up

Export Citation Format

Share Document