Scalable Initial State Interdiction for Factored MDPs

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/667 ◽

2018 ◽

Author(s):

Swetasudha Panda ◽

Yevgeniy Vorobeychik

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Optimal Policy ◽

Function Approximation ◽

Stackelberg Game ◽

Game Model ◽

Initial State ◽

Novel Approach ◽

Linear Function Approximation ◽

Computationally Expensive

We propose a novel Stackelberg game model of MDP interdiction in which the defender modifies the initial state of the planner, who then responds by computing an optimal policy starting with that state. We first develop a novel approach for MDP interdiction in factored state space that allows the defender to modify the initial state. The resulting approach can be computationally expensive for large factored MDPs. To address this, we develop several interdiction algorithms that leverage variations of reinforcement learning using both linear and non-linear function approximation. Finally, we extend the interdiction framework to consider a Bayesian interdiction problem in which the interdictor is uncertain about some of the planner's initial state features. Extensive experiments demonstrate the effectiveness of our approaches.

Download Full-text

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

AI Communications ◽

10.3233/aic-201580 ◽

2021 ◽

pp. 1-15

Author(s):

Theresa Ziemke ◽

Lucas N. Alegre ◽

Ana L.C. Bazzan

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Traffic Signals ◽

The State ◽

Signal Control ◽

Traffic Signal Control ◽

Rule Based ◽

Fourier Basis ◽

Linear Function Approximation

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.

Download Full-text

Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning - ICML '07 ◽

10.1145/1273496.1273591 ◽

2007 ◽

Cited By ~ 5

Author(s):

Chee Wee Phua ◽

Robert Fitch

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Value Function ◽

Piecewise Linear ◽

Piecewise Linear Function ◽

Linear Function Approximation

Download Full-text

Parallel reinforcement learning with linear function approximation

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07 ◽

10.1145/1329125.1329179 ◽

2007 ◽

Author(s):

Matthew Grounds ◽

Daniel Kudenko

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Linear Function Approximation

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text

Count-Based Exploration in Feature Space for Reinforcement Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/344 ◽

2017 ◽

Cited By ~ 7

Author(s):

Jarryd Martin ◽

Suraj Narayanan S. ◽

Tom Everitt ◽

Marcus Hutter

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Feature Space ◽

Feature Representation ◽

High Dimensional ◽

Training Experience ◽

Approximation Techniques ◽

State Action ◽

Efficient Exploration

We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.

Download Full-text

Using Reinforcement Learning to Control Traffic Signals in a Real-World Scenario: An Approach Based on Linear Function Approximation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3091014 ◽

2021 ◽

pp. 1-10

Author(s):

Lucas N. Alegre ◽

Theresa Ziemke ◽

Ana L. C. Bazzan

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Real World ◽

Function Approximation ◽

Traffic Signals ◽

Linear Function Approximation ◽

Control Traffic

Download Full-text

Convergence of synchronous reinforcement learning with linear function approximation

Twenty-first international conference on Machine learning - ICML '04 ◽

10.1145/1015330.1015390 ◽

2004 ◽

Cited By ~ 2

Author(s):

Artur Merke ◽

Ralf Schoknecht

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Linear Function Approximation

Download Full-text

Gaussian Based Non-linear Function Approximation for Reinforcement Learning

SN Computer Science ◽

10.1007/s42979-021-00642-4 ◽

2021 ◽

Vol 2 (3) ◽

Author(s):

Abbas Haider ◽

Glenn Hawe ◽

Hui Wang ◽

Bryan Scotney

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

World Market ◽

Information Loss ◽

State Spaces ◽

State Information ◽

Linear Function Approximation ◽

Non Linear ◽

Tile Coding

AbstractReinforcement learning (RL) problems with continuous states and discrete actions (CSDA) can be found in classic examples such as Cart Pole and Puck World, as well as real world applications such as Market Making. Solutions to CSDA problems typically involve a function approximation (FA) of the mapping from states to actions and can be linear or nonlinear. Linear FAs such as tile-coding (Sutton and Barto in Reinforcement learning, 2nd ed, 2009) suffer from state information loss due to state discretization, whilst non-linear FAs such as DQN (Mnih et al. in Playing atari with deep reinforcement learning, https://arxiv.org/abs/1312.5602, 2013) are practically infeasible in infinitely large state spaces due to their cubic time complexity ($$O(n^3)$$ O ( n 3 ) ). In this paper, we propose a novel, general solution to CSDA problems, called Gaussian distribution based non-linear function approximation (GBNLFA). Experimentation on three CSDA RL problems (Cart Pole, Puck World, Market Marking) demonstrates the superiority of GBNLFA over state-of-the-art FAs, namely tile-coding and DQN. In particular, GBNLFA resolves the state information loss problem with linear FAs and provides an asymptotically faster algorithm (O(n)) than linear FAs ($$O(n^2)$$ O ( n 2 ) ) and neural network based nonlinear FAs ($$O(n^3)$$ O ( n 3 ) ).

Download Full-text

Diffusion gradient temporal difference for cooperative reinforcement learning with linear function approximation

2012 3rd International Workshop on Cognitive Information Processing (CIP) ◽

10.1109/cip.2012.6232901 ◽

2012 ◽

Cited By ~ 1

Author(s):

Sergio Valcarcel Macua ◽

Pavle Belanovic ◽

Santiago Zazo

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Temporal Difference ◽

Linear Function Approximation

Download Full-text

Reinforcement Learning in Optimizing Forest Management

Canadian Journal of Forest Research ◽

10.1139/cjfr-2020-0447 ◽

2021 ◽

Cited By ~ 1

Author(s):

Pekka Malo ◽

Olli Tahvonen ◽

Antti Suominen ◽

Philipp Back ◽

Lauri Viitasaari

Keyword(s):

Reinforcement Learning ◽

Natural Disasters ◽

Optimal Policy ◽

Size Structure ◽

Optimal Harvesting ◽

Sequential Decision ◽

Initial State ◽

Stand Growth ◽

Clear Cut ◽

Management Regime

We solve a stochastic high-dimensional optimal harvesting problem by reinforcement learning algorithms developed for agents who learn an optimal policy in a sequential decision process through repeated experience. This approach produces optimal solutions without discretization of state and control variables. Our stand-level model includes mixed species, tree size structure, optimal harvest timing, choice between rotation and continuous cover forestry, stochasticity in stand growth, and stochasticity in the occurrence of natural disasters. The optimal solution or policy maps the system state to the set of actions, i.e. clear-cut/thinning/no harvest decisions and the intensity of thinning over tree species and size classes. The algorithm repeats the solutions for deterministic problems computed earlier with time-consuming methods. Optimal policy describes harvesting choices from any initial state and reveals how the initial thinning vs. clear-cut choice depends on the economic and ecological factors. Stochasticity in stand growth increases the diversity of species composition. Despite the high variability in natural regeneration, the optimal policy closely satisfies the certainty equivalence principle. The effect of natural disasters is similar to an increase in the interest rate, but in contrast to earlier results, this tends to change the management regime from rotation forestry to continuous cover management.

Download Full-text