Gaussian Based Non-linear Function Approximation for Reinforcement Learning

AbstractReinforcement learning (RL) problems with continuous states and discrete actions (CSDA) can be found in classic examples such as Cart Pole and Puck World, as well as real world applications such as Market Making. Solutions to CSDA problems typically involve a function approximation (FA) of the mapping from states to actions and can be linear or nonlinear. Linear FAs such as tile-coding (Sutton and Barto in Reinforcement learning, 2nd ed, 2009) suffer from state information loss due to state discretization, whilst non-linear FAs such as DQN (Mnih et al. in Playing atari with deep reinforcement learning, https://arxiv.org/abs/1312.5602, 2013) are practically infeasible in infinitely large state spaces due to their cubic time complexity ($$O(n^3)$$ O ( n 3 ) ). In this paper, we propose a novel, general solution to CSDA problems, called Gaussian distribution based non-linear function approximation (GBNLFA). Experimentation on three CSDA RL problems (Cart Pole, Puck World, Market Marking) demonstrates the superiority of GBNLFA over state-of-the-art FAs, namely tile-coding and DQN. In particular, GBNLFA resolves the state information loss problem with linear FAs and provides an asymptotically faster algorithm (O(n)) than linear FAs ($$O(n^2)$$ O ( n 2 ) ) and neural network based nonlinear FAs ($$O(n^3)$$ O ( n 3 ) ).

Download Full-text

Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning - ICML '07 ◽

10.1145/1273496.1273591 ◽

2007 ◽

Cited By ~ 5

Author(s):

Chee Wee Phua ◽

Robert Fitch

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Value Function ◽

Piecewise Linear ◽

Piecewise Linear Function ◽

Linear Function Approximation

Download Full-text

A logarithmic neural network architecture for unbounded non-linear function approximation

Proceedings of International Conference on Neural Networks (ICNN'96) ◽

10.1109/icnn.1996.549076 ◽

2002 ◽

Cited By ~ 6

Author(s):

J.W. Hines

Keyword(s):

Neural Network ◽

Linear Function ◽

Network Architecture ◽

Function Approximation ◽

Neural Network Architecture ◽

Linear Function Approximation ◽

Non Linear

Download Full-text

Parallel reinforcement learning with linear function approximation

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07 ◽

10.1145/1329125.1329179 ◽

2007 ◽

Author(s):

Matthew Grounds ◽

Daniel Kudenko

Keyword(s):

Reinforcement Learning ◽

Linear Function ◽

Function Approximation ◽

Linear Function Approximation

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text